使用DESeq2标准化之后的数据进行PCA、聚类等可视化

作者: Cdudu | 来源:发表于2020-09-29 11:28 被阅读0次

DESeq2进行数据标准化的命令有多种

counts(dds, normalized=T)
rlog、VST

两者的区别在于

前者

are “only” library-size normalised

而后者

more advanced

PCA和聚类等可视化分析时应该使用后者

downstream processing generally requires more advanced normalisation

个人理解

counts(dds, normalized=T)是用于做DEG的标准化方法，DEG只是要比较不同样本间同一个基因是否有差异，因此只把counts在样本内做了标准化，从而使不同样本的同一个基因具有可比性。
PCA，聚类等分析不仅要比较不同样本间同一个基因的差异，还要计算同一个样本内不同基因的贡献，显然counts(dds, normalized=T)没有包括这部分的标准化，如果直接用这个数据做分析，会导致样本内表达量大的基因对结果影响过大。但是如果简单用log+1的方式进行转换，又会导致表达量小的基因影响过大，因此DESeq2提出了rlog和VST的方法

RNA–Seq data, however, variance grows with the mean. For example, if one performs PCA directly on a matrix of normalized read counts, the result typically depends only on the few most strongly expressed genes because they show the largest absolute differences between samples.
A simple and often used strategy to avoid this is to take the logarithm of the normalized count values plus a small pseudocount; however, now the genes with low counts tend to dominate the results because, due to the strong Poisson noise inherent to small count values, they show the strongest relative differences between samples. Note that this effect can be diminished by adding a relatively high number of pseudocounts, e.g. 32, since this will also substantially reduce the variance of the fold changes.

疑问

如果提取了 counts(dds, normalized=T)的数据，是否再进行scale或者log转换，可以达到类似rlog和VST的效果呢？

结果

使用 counts(dds, normalized=T)标准化count值，然后再进行scale，PCA效果接近rlog，但不是完全一样。

References

How can I extract normalized read count values from DESeq2 results
QC methods for DE analysis using DESeq2
Analysis of RNAseq data

网友评论

数据整理

本文标题：使用DESeq2标准化之后的数据进行PCA、聚类等可视化

本文链接：https://www.haomeiwen.com/subject/zpzsuktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

使用DESeq2标准化之后的数据进行PCA、聚类等可视化

DESeq2进行数据标准化的命令有多种

两者的区别在于

PCA和聚类等可视化分析时应该使用后者

个人理解

疑问

结果

References

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

数据整理