【零基础练习一个RNA-seq分析】CH7：DEseq2探索表达

作者: josia_luo | 来源:发表于2020-08-20 22:18 被阅读0次

我们已经得到了一个表达矩阵，下面我们就用DEseq2筛选差异表达基因，并进行简单的visulization。

DESeq流程

第一步我们需要准备两个表，借用Y大宽姐姐的图来说明一下。

countData表示的是count矩阵，行代表gene，列代表样品，中间的数字代表对应count数。colData表示sample的元数据，因为这个表提供了sample的元数据。
because this table supplies metadata/information about the columns of the countData matrix. Notice that the first column of colData must match the column names of countData (except the first, which is the gene ID column).
colData的行名与countData的列名一致（除去代表gene ID的第一列）
下面我们开始构建colData

condition <- factor(c(rep("control",2),rep("treat",2)), levels = c("control","treat"))
colData <- data.frame(row.names=colnames(raw_count_filt), condition) #当然这里要注意之前小数点那列还在不在，如果在的话需要先从表达矩阵中删去。

下面开始DESeq流程

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), design= ~ condition )
dds <- DESeq(dds)
res = results(dds, contrast=c("condition", "treat", "control")) #这里要注意顺序不能错
summary(res) #看看总体的情况

out of 31246 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 641, 2.1%
LFC < 0 (down)     : 444, 1.4%
outliers [1]       : 0, 0%
low counts [2]     : 15691, 50%
(mean count < 22)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

这样就完成了一个简单的DESeq2分析流程。

简单可视化

MA图

到这里我们就可以画一个MA图了

plotMA(res, ylim = c(-5,5))
topGene <- rownames(res)[which.min(res$padj)]
with(res[topGene, ], {
  points(baseMean, log2FoldChange, col="dodgerblue", cex=2, lwd=2)
  text(baseMean, log2FoldChange, topGene, pos=2, col="dodgerblue")
})

这个图有点丑，我们需要收缩一下

res.shrink <- lfcShrink(dds, coef="condition_treat_vs_control", type="apeglm")
plotMA(res.shrink, ylim = c(-5,5))
topGene <- rownames(res)[which.min(res$padj)]
with(res[topGene, ], {
  points(baseMean, log2FoldChange, col="dodgerblue", cex=2, lwd=2)
  text(baseMean, log2FoldChange, topGene, pos=2, col="dodgerblue")
})

这就好看多了。

ggplot

关于ggplot2，可以看看这篇教程https://www.jianshu.com/p/03719d7c207f
首先我们用DESseq2里的plotcount函数来看某个基因的情况。

# 不画图，只显示数据
plotCounts(dds, gene=which.min(res$padj), intgroup="condition", returnData=TRUE)
#只画图，不显示数据
plotCounts(dds, gene="ENSMUSG00000024045", intgroup="condition", returnData=FALSE)

先画一个point plot

library(ggplot2)
d <- plotCounts(dds, gene="ENSMUSG00000024045", intgroup="condition", returnData=TRUE)
ggplot(d, aes(x=condition, y=count)) + 
  geom_point(aes(color= condition),size= 4, position=position_jitter(w=0.5,h=0)) + 
  scale_y_log10(breaks=c(25,100,400))+ ggtitle("Akap8")

再画一个boxplot

ggplot(d, aes(condition, count)) + geom_boxplot(aes(fill=condition)) + scale_y_log10() + ggtitle("Akap8")

热图

ntd <- normTransform(dds)
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)

热图这部分我就走马观花做了一下，以后再来详细写吧。照例放一个链接

网友评论

本文标题：【零基础练习一个RNA-seq分析】CH7：DEseq2探索表达

本文链接：https://www.haomeiwen.com/subject/qpsnrktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

【零基础练习一个RNA-seq分析】CH7：DEseq2探索表达

DESeq流程

简单可视化

MA图

ggplot

热图

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

生信

转录组数据分析

三代测序技术