美文网首页scRNAseq重点关注
Seurat常规聚类和SCTransform脚本汇总2022-0

Seurat常规聚类和SCTransform脚本汇总2022-0

作者: 黄甫一 | 来源:发表于2022-05-26 20:22 被阅读0次

    前沿

    单细胞常规分析用得最多的软件非R语言的Seurat包莫属,当然还有Python语言的Scanpy,但是对于入门者来说,R语言更容易上手,而且很多单细胞分析的扩展包也是基于Seurat的,或者很多R包的输入就是Seurat参数。
    然而,对于数据量较大的情况,R语言就显得很鸡肋,不仅耗运行内存而且很慢,因此基于Python的软件包更为适合分析大数据,而且R语言能实现的Python基本上都能实现。随着单细胞组学的迅速发展,未来数据也将迎来超大量级的产出,基于Python的软件包应该更有前景。
    言归正传,因为Seurat的常规聚类基本不变,本文主要把此流程封装成函数,这样之后调用起来都比较方便和简洁。

    • 常规流程版

    下面的函数经过NormalizeData和ScaleData常规聚类步骤,还可以根据需求设置参数。

    • obj,Seurat对象
    • mt.pattern或mt.list,mt.pattern模糊匹配线粒体基因名,mt.list给定线粒体基因向量
    • dim.use,PCA主成分数目
    • mt.cutoff,线粒体百分比阈值
    • nf.low或nf.high,基因数下限或上下
    • nfeatures,选取高变基因数目
    • res,亚群分辨率
    seob_cluster <- function(obj,
                             mt.pattern="^MT-",mt.list=NULL,dim.use=30,mt.cutoff=10,
                             nf.low=200,nf.high=6000,nfeatures=3000,
                             res=1) {
    all <- obj
    if (is.null(mt.list)) {
    all[["percent.mt"]] <- PercentageFeatureSet(all, pattern = mt.pattern)
    }else{
    mt.list <- mt.list[which(mt.list %in% rownames(all))]
    all[["percent.mt"]] <- PercentageFeatureSet(all, features = mt.list)
    }
    all <- subset(all, subset = nFeature_RNA > nf.low & percent.mt < mt.cutoff & nFeature_RNA < nf.high)
    all <- NormalizeData(all, normalization.method = "LogNormalize", scale.factor = 10000)
    all <- FindVariableFeatures(all, selection.method = "vst", nfeatures = nfeatures)
    all.genes <- rownames(all)
    all <- ScaleData(all, features = all.genes)
    
    all <- RunPCA(all, features = VariableFeatures(all), npcs = 40, verbose = F)
    all <- FindNeighbors(all, dims = 1:dim.use)
    all <- FindClusters(all, resolution = res)
    all <- RunUMAP(all, dims = 1:dim.use)
    }
    
    • SCTransform版

    根据官网介绍,SCTransform处理更为合理,因此也可以使用以下流程,参数与上面的函数一致。

    sct_seob_cluster <- function(obj,
                             mt.pattern="^MT-",mt.list=NULL,dim.use=30,mt.cutoff=10,
                             nf.low=200,nf.high=6000,nfeatures=3000,
                             res=1) {
    pbmc <- obj
    if (is.null(mt.list)) {
    pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = mt.pattern)
    }else{
    mt.list <- mt.list[which(mt.list %in% rownames(pbmc))]
    pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, features = mt.list)
    }
    pbmc <- subset(pbmc, subset = nFeature_RNA > nf.low & percent.mt < mt.cutoff & nFeature_RNA < nf.high)
    pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = nfeatures)
    if (!requireNamespace("glmGamPoi", quietly = TRUE)) {
    pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
    }else{
    pbmc <- SCTransform(pbmc, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)
    }
    
    pbmc <- RunPCA(pbmc, features = VariableFeatures(pbmc), npcs = 40, verbose = F)
    pbmc <- RunUMAP(pbmc, dims = 1:dim.use, verbose = FALSE)
    pbmc <- FindNeighbors(pbmc, dims = 1:dim.use, verbose = FALSE)
    pbmc <- FindClusters(pbmc, verbose = FALSE, resolution = res)
    }
    

    相关文章

      网友评论

        本文标题:Seurat常规聚类和SCTransform脚本汇总2022-0

        本文链接:https://www.haomeiwen.com/subject/gcrpprtx.html