美文网首页
转录组counts数据清洗

转录组counts数据清洗

作者: pudding815 | 来源:发表于2023-06-29 15:23 被阅读0次

更新https://www.jianshu.com/p/8a9cde00a228是CPM筛选

目前我用过两种,听过三种
1 中位数的tpm>0.5

pick.genes <- unique(c(rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 1:2) > 0.5, ]),
                       rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 3:4) > 0.5, ]),
                       rownames(tpm[matrixStats::rowMedians(as.matrix(tpm[,1:6]), cols = 5:6) > 0.5, ])
                       ))

2 count值在四分之三样本中大于10

pick.gene <- unique(c(
    rownames(counts[rowSums(counts[, 1:4] > 10) >= (ncol(counts[, 1:4]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 5:7] > 10) >= (ncol(counts[, 5:7]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 8:11] > 10) >= (ncol(counts[, 8:11]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 12:15] > 10) >= (ncol(counts[, 12:15]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 16:19] > 10) >= (ncol(counts[, 16:19]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 20:23] > 10) >= (ncol(counts[, 20:23]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 24:27] > 10) >= (ncol(counts[, 24:27]) * 3/4), ]),
    rownames(counts[rowSums(counts[, 28:31] > 10) >= (ncol(counts[, 28:31]) * 3/4), ])
  ))

3 fpkm>1

data[rowSums(data > 1) == ncol(data), ]

相关文章

网友评论

      本文标题:转录组counts数据清洗

      本文链接:https://www.haomeiwen.com/subject/cuaaydtx.html