转录组counts数据清洗

作者: pudding815 | 来源:发表于2023-06-29 15:23 被阅读0次

转录组 fastq to counts
转录组学习三（数据质控）
转录组学习二（数据下载）
转录组学习五（reads比对）
转录组学习八（功能富集分析）
转录组学习六（reads计数与标准化）
转录组学习四（参考基因组及gtf注释探究）
转录组学习一（软件安装）
转录组数据库的基本使用（一）-GO数据库
bulk RNA-Seq (1) 数据清洗

更新https://www.jianshu.com/p/8a9cde00a228是CPM筛选

目前我用过两种，听过三种
1 中位数的tpm>0.5

pick.genes <- unique(c(rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 1:2) > 0.5, ]),
                       rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 3:4) > 0.5, ]),
                       rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 5:6) > 0.5, ])
                       ))

2 count值在四分之三样本中大于10

pick.gene <- unique(c(
    rownames(counts[rowSums(counts[, 1:4] > 10) >= (ncol(counts[, 1:4]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 5:7] > 10) >= (ncol(counts[, 5:7]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 8:11] > 10) >= (ncol(counts[, 8:11]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 12:15] > 10) >= (ncol(counts[, 12:15]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 16:19] > 10) >= (ncol(counts[, 16:19]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 20:23] > 10) >= (ncol(counts[, 20:23]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 24:27] > 10) >= (ncol(counts[, 24:27]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 28:31] > 10) >= (ncol(counts[, 28:31]) * 3/4), ])
  ))

3 fpkm>1