原文链接Chapter 11 Marker gene detection
1、目的
- we identify the genes that drive separation between clusters. 鉴别直接导致分群结果的marker gene.
- In the most obvious case, the marker genes for each cluster are a priori associated with particular cell types .即 marker gene 与后面的细胞类型注释直接相关
- different approaches can be used to consolidate test results into a single ranking of genes for each cluster.
- 本文主要学习scran包的
findmarkers
函数其中的t-test方法。
load("fluidigm.clust.RData")
#加载上一步分析结果
fluidigm.clust
library(scran)
2、findMarkers
(1)基础用法
markers.fluidigm <- findMarkers(fluidigm.clust,groups=fluidigm.clust$label)
markers.fluidigm
markers.fluidigm$`1`
- 如上,findMarkers的基本参数为sce对象,clust信息。
-
其中比较方法默认为t.test
2-1
关于Top
列
- 首先要知道:t.test是基于所有的clust两两之间比较的,而不是一个clust与其它所有clust的均值作比较。
- 而Top列,以1为例,表示clust1分别与另外三个clust相比差异最显著top1的基因(ranked by p-value)
chosen <- "1" #查看clust1与其它三个clust的比较
interesting <- markers.fluidigm[[chosen]]
colnames(interesting)
interesting[1:9,1:4]

- 热图可视化
best.set <- interesting[interesting$Top <= 3,]
#选取与其它三个clust差异排名前三,共9个基因的矩阵
getMarkerEffects <- function(x, prefix="logFC", strip=TRUE, remove.na.col=FALSE) {
#prefix="logFC"
#x=best.set
regex <- paste0("^", prefix, "\\.")
i <- grep(regex, colnames(x))
out <- as.matrix(x[,i])
if (strip) {
colnames(out) <- sub(regex, "", colnames(out))
}
if (remove.na.col) {
out <- out[,!colAnyNAs(out),drop=FALSE]
}
out
}#不知为啥下载scran包没有这个函数,找到了函数源代码,同样效果
logFCs <- getMarkerEffects(best.set)
#得到行名为基因,列名为其它三类clust与clust1de1 logFC
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))

(2)细节筛选
基于logFC
#仅挑选上调基因
markers.fluidigm.up <- findMarkers(fluidigm.clust, direction="up",
groups=fluidigm.clust$label)
interesting.up <- markers.fluidigm.up[[chosen]]
interesting.up[1:10,1:4]
#logfc>1
markers.fluidigm.up2 <- findMarkers(fluidigm.clust, direction="up", lfc=1,
groups=fluidigm.clust$label)
interesting.up2 <- markers.fluidigm.up2[[chosen]]
interesting.up2[1:10,1:4]
These two settings yield a more focused set of candidate marker genes that are upregulated in cluster 1
best.set <- interesting.up2[interesting.up2$Top <= 5,]
logFCs <- getMarkerEffects(best.set)
library(pheatmap)
pheatmap(logFCs, breaks=seq(-5, 5, length.out=101))

基于比较方法
- 如前所述,基础用法是所有clust两两比较;
- 可以通过设置参数
pval.type=
修改p.value的比较关系
(1)pval.type="all"
- A more stringent approach would only consider genes that are differentially expressed in all pairwise comparisons involving the cluster of interest.
- 即clust1的某gene必须与其它所有clust表达都有差异才认定为DE
markers.fluidigm.up3 <- findMarkers(fluidigm.clust, pval.type="all",
direction="up",
groups=fluidigm.clust$label)
interesting.up3 <- markers.fluidigm.up3[[chosen]]
head(interesting.up3)
这种方法看起来很合理,但是也有缺点
- 此时的p.value为the maximum of the p-values from all pairwise comparisons.
- A gene will only achieve a low combined p-value if it is strongly DE in all comparisons to other clusters.
- However, any gene that is expressed at the same level in two or more clusters will simply not be detected.
- This is likely to discard many interesting genes, especially if the clusters are finely resolved with weak separation.
(2)pval.type="any"
- will be DE between at least one pair of subpopulations.
- too generous.
(3)pval.type="some"
- at least 50% of the individual pairwise comparisons exhibit no DE
- take the middle-most value as the combined p-value.
- reasonable
(3)其它参数
block=
- handle blocking factor
m.out <- findMarkers(fluidigm.clust,
block=fluidigm.clust$Coverage_Type,
direction="up",
groups=fluidigm.clust$label)
demo <- m.out[["1"]]
demo[demo$Top <= 5,1:4]
test=
- 默认为t.tests;
- 可以设定、采用其它方法,例如
wilcox
,binomial
等;具体可参考原文,不再记录。
save(markers.fluidigm, file = "markergene.Rdata")
以上是第十一章Clustering部分的简单流程笔记,主要学习了基于t.test的marker gene detection,其它如wilcox方法见原文Chapter 10 Marker gene detection
本系列笔记基于OSCA全流程的大致流程梳理,详细原理可参考原文。如有错误,恳请指正!
此外还有刘小泽老师整理的全文翻译笔记,详见目录。
网友评论