美文网首页单细胞转录组
CNS图表复现---人人都能学会的单细胞聚类分群注释

CNS图表复现---人人都能学会的单细胞聚类分群注释

作者: PhageNanoenzyme | 来源:发表于2021-03-09 00:14 被阅读0次

    这个数据集GSE129516,就是拿到了如下所示的数据文件:

    图片

    GEO下载的</figcaption>

    我首先读取了一个文件,看了看,就是表达矩阵,所以直接CreateSeuratObject即可,都省去了3个文件的组合命令。

    图片

    表达矩阵例子</figcaption>

    首先批量读取每个10x样品的表达矩阵

    保证当前工作目录下面有后缀是matrices.csv.gz的文件,就是前面下载的6个文件:

    rm(list=ls())
    options(stringsAsFactors = F)
    library(Seurat)
    
    fs=list.files(pattern = 'matrices.csv.gz')
    fs
    
    sceList <-  lapply(fs, function(x){
      a=read.csv( x )
      a[1:4,1:4]
      raw.data=a[,-1]
      rownames(raw.data)=a[,1]
      library(stringr)
      p=str_split(x,'_',simplify = T)[,2]
      sce <- CreateSeuratObject(counts = raw.data,project = p )
    })` </pre>
    

    每个matrices.csv.gz文件都读取后,提供CreateSeuratObject构建成为对象。如果是读取10x数据需要三个文件:barcodes.tsv, genes.tsv, matrix.mtx,那个更简单哦!

    然后使用seurat最出名的多个10x合并算法

    多个单细胞对象的整合,这里直接使用标准代码即可:

    pro='integrated' 
    
    for (i in 1:length(sceList)) {
      sceList[[i]] <- NormalizeData(sceList[[i]], verbose = FALSE)
      sceList[[i]] <- FindVariableFeatures(sceList[[i]], selection.method = "vst", 
                                                 nfeatures = 2000, verbose = FALSE)
    }
    sceList
    sce.anchors <- FindIntegrationAnchors(object.list = sceList, dims = 1:30)
    sce.integrated <- IntegrateData(anchorset = sce.anchors, dims = 1:30)
    

    因为是6个10X样品,所以这个步骤会略微有点耗费时间哦!

    接着走标准的降维聚类分群

    因为是构建好的对象,所以后续分析都是常规代码:

    library(ggplot2)
    library(cowplot)
    # switch to integrated assay. The variable features of this assay are automatically
    # set during IntegrateData
    DefaultAssay(sce.integrated) <- "integrated"
    
    # Run the standard workflow for visualization and clustering
    sce.integrated <- ScaleData(sce.integrated, verbose = FALSE)
    sce.integrated <- RunPCA(sce.integrated, npcs = 30, verbose = FALSE)
    sce.integrated <- RunUMAP(sce.integrated, reduction = "pca", dims = 1:30)
    p1 <- DimPlot(sce.integrated, reduction = "umap", group.by = "orig.ident")
    p2 <- DimPlot(sce.integrated, reduction = "umap", group.by = "orig.ident", label = TRUE, 
                  repel = TRUE) + NoLegend()
    plot_grid(p1, p2)
    p2
    ggsave(filename = paste0(pro,'_umap.pdf') )
    
    sce=sce.integrated
    DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)
    ElbowPlot(sce)
    sce <- FindNeighbors(sce, dims = 1:15)
    sce <- FindClusters(sce, resolution = 0.2)
    table(sce@meta.data$integrated_snn_res.0.2) 
    sce <- FindClusters(sce, resolution = 0.8)
    table(sce@meta.data$integrated_snn_res.0.8) 
    
    DimPlot(sce, reduction = "umap")
    ggsave(filename = paste0(pro,'_umap_seurat_res.0.8.pdf') )
    DimPlot(sce, reduction = "umap",split.by = 'orig.ident')
    ggsave(filename = paste0(pro,'_umap_seurat_res.0.8_split.pdf') )
    
    save(sce,file = 'integrated_after_seurat.Rdata')` </pre>
    

    最后对聚类的不同细胞亚群进行注释

    前面呢是标准的聚类分群,每个细胞亚群仅仅是一个编号,实际上还需要给予它们一定的生物学意义,我们这里采用SingleR的标准代码:

    rm(list=ls())
    options(stringsAsFactors = F)
    library(Seurat)
    
    load(file = 'integrated_after_seurat.Rdata')
    DefaultAssay(sce) <- "RNA"
    # for B cells :  cluster, 1,21
    VlnPlot(object = sce, features ='Cd19',log =T )  
    VlnPlot(object = sce, features ='Cd79a',log =T )  
    
    library(SingleR)
    sce_for_SingleR <- GetAssayData(sce, slot="data")
    clusters=sce@meta.data$seurat_clusters
    mouseImmu <- ImmGenData()
    pred.mouseImmu <- SingleR(test = sce_for_SingleR, ref = mouseImmu, labels = mouseImmu$label.main,
                              method = "cluster", clusters = clusters, 
                              assay.type.test = "logcounts", assay.type.ref = "logcounts")
    
    mouseRNA <- MouseRNAseqData()
    pred.mouseRNA <- SingleR(test = sce_for_SingleR, ref = mouseRNA, labels = mouseRNA$label.fine ,
                             method = "cluster", clusters = clusters, 
                             assay.type.test = "logcounts", assay.type.ref = "logcounts")
    
    cellType=data.frame(ClusterID=levels(sce@meta.data$seurat_clusters),
                        mouseImmu=pred.mouseImmu$labels,
                        mouseRNA=pred.mouseRNA$labels )
    head(cellType)
    sce@meta.data$singleR=cellType[match(clusters,cellType$ClusterID),'mouseRNA']
    
    pro='first_anno'
    DimPlot(sce,reduction = "umap",label=T, group.by = 'singleR')
    ggplot2::ggsave(filename = paste0(pro,'_tSNE_singleR.pdf'))
    DimPlot(sce,reduction = "umap",label=T,split.by ='orig.ident',group.by = 'singleR')
    ggplot2::ggsave(filename = paste0(pro,'_tSNE_singleR_by_orig.ident.pdf'))
    
    save(sce,file = 'last_sce.Rdata')` </pre>
    

    出图如下:

    图片

    降维聚类分群注释</figcaption>

    可以看到效果还是杠杆的,而且我全程都是标准代码,就是follow群主的教程即可,我的R也是半吊子水平,只有你敢动手,这个图你也可以自己亲手做出来哦。

    原文链接
    人人都能学会的单细胞聚类分群注释

    相关文章

      网友评论

        本文标题:CNS图表复现---人人都能学会的单细胞聚类分群注释

        本文链接:https://www.haomeiwen.com/subject/llijqltx.html