0.背景知识
Seurat里的FindClusters函数设置的resolution数值越大,分群的数量就越多,但是当单细胞数量太多的时候,会遇到resolution再变大,分群的数量也不再增加的情况。一次分群分不开时就会需要二次分群。
1.示例数据
这里的示例数据seu.obj.Rdata是GSE218208降维聚类分群注释的结果,在生信星球聊天框回复“955gsva”获取(和昨天的数据一样,里面的gmt今天是用不到的)。
rm(list = ls())
library(Seurat)
library(dplyr)
load("../24.5.23scGSVA/seu.obj.Rdata")
p1 = DimPlot(seu.obj, reduction = "umap",label=T)+NoLegend()
p1
2.二次分群
这里以树突细胞(DC)为例进行二次分群
核心就是提取感兴趣的亚群的细胞,后面就是标准流程和可视化了,没有区别
sub.cells <- subset(seu.obj, idents = "DC")
f = "obj.Rdata"
if(!file.exists(f)){
sub.cells = sub.cells %>%
NormalizeData() %>%
FindVariableFeatures() %>%
ScaleData(features = rownames(.)) %>%
RunPCA(features = VariableFeatures(.)) %>%
FindNeighbors(dims = 1:15) %>%
FindClusters(resolution = 0.5) %>%
RunUMAP(dims = 1:15)
save(sub.cells,file = f)
}
load(f)
DimPlot(sub.cells, reduction = 'umap',label = T)+NoLegend()
3.marker基因及其可视化
sub.cells.markers <- FindAllMarkers(sub.cells, only.pos = TRUE,
min.pct = 0.25, logfc.threshold = 0.25)
top10 <- sub.cells.markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) %>%
pull(gene);top10
## [1] "JCHAIN" "IGKC" "MZB1" "PACSIN1" "WNT10A" "MAP1A" "VASH2"
## [8] "NIBAN3" "SMPD3" "TNFRSF4" "LYZ" "TIMP1" "GPAT3" "ITGAX"
## [15] "SAMSN1" "OLR1" "FPR3" "EREG" "FCN1" "AKAP12"
VlnPlot(sub.cells, features = top10)
RidgePlot(sub.cells, features = top10)
FeaturePlot(sub.cells, features = top10)
DotPlot(sub.cells,features = top10)+ RotatedAxis()
DoHeatmap(sub.cells, features = top10) + NoLegend()
4.放回原来的Seurat对象里面
上面的umap图是感兴趣的单独的展示,也可以把它放回原来的seurat对象里。
sub.cells@meta.data$celltype = paste0("M",sub.cells$seurat_clusters)
seu.obj$celltype = as.character(Idents(seu.obj))
seu.obj$celltype = ifelse(seu.obj$celltype=="DC",
sub.cells$celltype[match(colnames(seu.obj),colnames(sub.cells))],
seu.obj$celltype)
Idents(seu.obj) = seu.obj$celltype
p2 = DimPlot(seu.obj,label = T)+NoLegend()
p1+p2
网友评论