有些时候,很多分析明知道不能做,还不得不去做。
比如,只有TPM值情况下的基因差异表达分析:
ExM # log2TPM 表达矩阵
s1 # 属于类型1(如 tumor)的所有样本ID
s2 # 属于类型2(如 normal)的所有样本ID
cat("wilcox.test\n")
pvalue = padj = log2FoldChange = matrix(0, nrow(ExM), 1)
for(i in 1:nrow(ExM)){
pvalue[i, 1] = p.value = wilcox.test(ExM[i, s1], ExM[i, s2])$p.value
log2FoldChange[i, 1] = mean(ExM[i, s1]) - mean(ExM[i, s2])
}
padj = p.adjust(as.vector(pvalue), "fdr", n = length(pvalue))
rTable = data.frame(log2FoldChange, pvalue, padj, row.names = rownames(ExM))
treatment_Log2TPM <- signif(apply(ExM[rownames(rTable), s1], 1, mean), 4)
control_Log2TPM <- signif(apply(ExM[rownames(rTable), s2], 1, mean), 4)
cat("mark DGE\n")
DGE <- rep("NC", nrow(ExM))
DGE[((rTable$padj) < 0.05) & (rTable$log2FoldChange > 0)] = "UP"
DGE[((rTable$padj) < 0.05) & (rTable$log2FoldChange < 0)] = "DN"
gene = rownames(ExM)
rTable = data.frame(treatment_Log2TPM, control_Log2TPM, rTable[, c("log2FoldChange", "pvalue", "padj")], DGE)
head(rTable)
treatment_Log2TPM control_Log2TPM log2FoldChange
ENSG00000166535.20 A2ML1 1.4870 1.8410 -0.3536611345
ENSG00000175899.15 A2M 14.3500 14.1000 0.2527242657
ENSG00000197953.6 AADACL2 0.1622 0.1491 0.0131439534
ENSG00000204518.2 AADACL4 0.1487 0.1492 -0.0005166819
ENSG00000115977.19 AAK1 9.6250 9.7070 -0.0817361189
ENSG00000127837.9 AAMP 11.2000 11.2000 0.0019474297
pvalue padj DGE
ENSG00000166535.20 A2ML1 0.19797430 0.3930997 NC
ENSG00000175899.15 A2M 0.13120671 0.2997906 NC
ENSG00000197953.6 AADACL2 0.09516201 0.2405597 NC
ENSG00000204518.2 AADACL4 0.52208746 0.7067366 NC
ENSG00000115977.19 AAK1 0.44455824 0.6436331 NC
ENSG00000127837.9 AAMP 0.91096435 0.9563070 NC
代码来自一篇被删掉的帖子,估计他也觉得不能做
wilcox.test 进行 RNAseq 差异表达基因分析
事实上,不同处理之间坚决不能这样做。
而有些时候,这样还是能接受的,比如同一个样本/处理下的同源基因对之间的差异表达分析。
网友评论