scCATCH全称是single cell Cluster-based Annotation Toolkit for Cellular Heterogeneity,是一个用于实现单细胞转录组聚类结果进行注释的工具。软件核心函数是和scCATCH,findmarkergenes则是辅助用于寻找标记。属于marker gene based cell type annotation工具中的一种。但是缺点是目前只支持human和mouse,后台没有其它物种的库。
====安装====
devtools::install_github("ZJUFanLab/scCATCH")
===运行,先用pbmc数据试下===
测试数据下载地址:http://cf.10xgenomics.com/samples/cell-exp/3.0.2/5k_pbmc_v3/5k_pbmc_v3_filtered_feature_bc_matrix.h5
h5_file <- "5k_pbmc_v3_filtered_feature_bc_matrix.h5"
# Load the PBMC dataset
#pbmc.data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc.data <- Read10X_h5(h5_file) //这块和运行seurat是一样的
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
pbmc <- subset(pbmc, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 20)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
ElbowPlot(pbmc)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.2)
pbmc <- RunUMAP(pbmc, dims = 1:10)
DimPlot(pbmc, label = TRUE)
//其实整个上面部分就是seurat的运行过程
接下来,使用findmarkergenes寻找每个cluster的差异基因。这一步的运行时间比较长,因为每个cluster都需要和其他的所有cluster按个比较,然后确定出当前cluster的特异基因。(其实我觉得和seurat和MAST鉴定cluster差异基因方法差不多)
clu_markers <- findmarkergenes(pbmc,species = "Human",cluster = 'All', match_CellMatch = FALSE,cancer = NULL,tissue = NULL,cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)
clu_ann <- scCATCH(clu_markers$clu_markers,species = "Human",cancer = NULL,tissue = "Blood")
可以看出,其实就是挑出每个cluster的marker基因,然后与库中的cell type注释对比,给出一个score。
然后把相应的type label添加即可。
new.cluster.ids <- clu_ann$cell_type
names(new.cluster.ids) <- clu_ann$cluster
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
本文使用 文章同步助手 同步
网友评论