美文网首页
【单细胞组学】细胞表面蛋白辅助细胞分群

【单细胞组学】细胞表面蛋白辅助细胞分群

作者: SciNote | 来源:发表于2022-08-01 17:44 被阅读0次

    导言

    我们之前介绍了CITE-seq的测序原理和使用CITE-seq-Count工具处理reads数据得到蛋白表达矩阵。

    CITE-seq-Count输出文件结构如下:

    OUTFOLDER/
    -- umi_count/
    -- -- matrix.mtx.gz
    -- -- features.tsv.gz
    -- -- barcodes.tsv.gz
    -- read_count/
    -- -- matrix.mtx.gz
    -- -- features.tsv.gz
    -- -- barcodes.tsv.gz
    -- unmapped.csv
    -- run_report.yaml
    

    跟处理scRNA-seq数据的 cell ranger工具的输出结果有些类似。最后需要用到的是其中umi_count文件夹下的matrix.mtx.gz,features.tsv.gz,barcodes.tsv.gz这三个文件。

    因为mRNA 表达和细胞表面蛋白表达是配套的数据,也就是每个细胞都得到了其mRNA全转录组和部分细胞表面蛋白的表达水平,那么分析这两个数据就有两种思路:

    • 仅基于scRNA-seq 数据对细胞进行聚类、分群、降维等标准流程,然后将细胞表面蛋白表达水平(ADT值)作为附加信息进行可视化等分析[1]。这种分析方法跟常规的单细胞转录组数据分析流程没有太大差别。
    • 同时基于scRNA-seq数据和细胞表面蛋白表达数据(ADT值)对细胞进行聚类、分群、降维等分析[2]。这种方法需要使用到weighted-nearest neighbor’ (WNN)等多模型数据整合算法来对数据进行整合。

    WNN算法是一个无监督的框架,用于了解每个细胞中每种数据类型的相对效用,从而能够对多种数据类型进行综合分析。

    Seurat官方示例教程中有提到,对30672个同时测了mRNA和25个细胞表面蛋白的来自骨髓的单细胞数据分析结果显示,细胞表面蛋白可以帮助一些免疫细胞状态的细分:

    We find that the RNA analysis is more informative than the ADT analysis in identifying progenitor states (the ADT panel contains markers for differentiated cells), while the converse is true of T cell states (where the ADT analysis outperforms RNA).

    对该骨髓来源来的免疫细胞数据,mRNA表达数据能更好的区分祖细胞的状态,细胞表面蛋白表达数据则能更好的区分T细胞的状态。当然这有可能是因为该数据检测的细胞表面蛋白抗体panel里,包括了针对分化细胞的marker而没有包括针对祖细胞状态的marker,毕竟它只包含了25个蛋白。

    鉴于Seurat的示例数据里细胞表面蛋白的检测panel都比较小,为了验证细胞表面蛋白表达数据在帮助各种细胞类型细分时的效果,我们用之前提到的那篇文献中的BRCA数据进行分析,该研究测了118个细胞表面蛋白panel,有来自4个样本共7363个单细胞的scCITE-seq数据。

    Seurat包分析CITE-seq数据

    library(Seurat)
    library(ggplot2)
    library(patchwork)
    
    ###
    adt=Read10X('./umi_count/',gene.column = 1)
    med<-read.csv('./metadata.csv',row.names = 1)
    counts <- Read10X(data.dir = path,gene.column=1)
    
    brca.cite.med=med[med$cell_id%in%colnames(adt),]
    table(brca.cite.med$celltype_major)
    # B-cells              CAFs Cancer Epithelial       Endothelial           Myeloid 
    # 631               686                 7               549              1000 
    # Plasmablasts               PVL           T-cells 
    # 151               972              3374 
    
    #可以看到肿瘤上皮细胞非常少。为了方面后续分析,这里去掉这些肿瘤上皮细胞
    brca.cite.med=brca.cite.med[brca.cite.med$celltype_major!='Cancer Epithelial',]
    brca.cite.rna=counts[,rownames(brca.cite.med)]
    brca.cite.adt=adt[,rownames(brca.cite.med)]
    
    ##
    cols_major= RColorBrewer::brewer.pal(n=7,name='Paired')
    names(cols_major)=unique(brca.cite.med$celltype_major)
    
    ###########################################################
    ### Define cellular states based on only mRNA expression ##
    ############################################################
    
    ###Setup a Seurat object, add the RNA and protein data
    brca.cite <- CreateSeuratObject(counts = brca.cite.rna,
                                    meta.data =brca.cite.med )
    adt_assay <- CreateAssayObject(counts = brca.cite.adt)
    # add this assay to the previously created Seurat object
    brca.cite[["ADT"]] <- adt_assay
    
    ###Cluster cells on the basis of their scRNA-seq profiles
    DefaultAssay(brca.cite) <- "RNA"
    # perform visualization and clustering steps
    brca.cite <- NormalizeData(brca.cite)
    brca.cite <- FindVariableFeatures(brca.cite)
    brca.cite <- ScaleData(brca.cite)
    brca.cite <- RunPCA(brca.cite, verbose = FALSE)
    
    # 因为我们已经有细胞的分群信息,所以不再进行聚类和分群
    # brca.cite <- FindNeighbors(brca.cite, dims = 1:30)
    # brca.cite <- FindClusters(brca.cite, resolution = 0.8, verbose = FALSE)
    brca.cite <- RunUMAP(brca.cite,reduction = "pca", dims = 1:30) 
    
    p1=DimPlot(brca.cite,group.by = 'orig.ident')+NoLegend()
    p2=DimPlot(brca.cite,group.by = 'celltype_major', cols = cols_major,label = T,label.size = 5,repel = T)
    p1|p2
    
    ###Normalize ADT data,
    DefaultAssay(brca.cite) <- "ADT"
    brca.cite <- NormalizeData(brca.cite, normalization.method = "CLR", margin = 2)
    DefaultAssay(brca.cite) <- "RNA"
    
    ###Identify cell surface markers for scRNA-seq clusters
    Idents(brca.cite)=brca.cite$celltype_major
    adt_markers <- FindAllMarkers(brca.cite, assay = "ADT")
    rna_markers <- FindAllMarkers(brca.cite, assay = "RNA")
    
    ###查看一些基质细胞和免疫细胞标志基因的蛋白和mRNA的表达:
    p1 <- FeaturePlot(brca.cite, c("adt_CD45",'adt_CD31','adt_CD19',"adt_MCAM"),slot='data', cols = viridis(15),ncol = 4)
    p2 <- FeaturePlot(brca.cite, c("rna_PTPRC",'rna_PECAM1','rna_CD19',"rna_MCAM"),slot='data', cols =inferno(15),ncol = 4)
    p1 /p2
    
    可以看到CD45和CD19这两个基因mRNA和蛋白表达存在较大差异。
    ################################################################
    ##### WNN: Define cellular states based on both data types #####
    ################################################################
    brca.cite_wnn <- CreateSeuratObject(counts = brca.cite.rna,meta.data =brca.cite.med )
    adt_assay <- CreateAssayObject(counts = brca.cite.adt)
    # add this assay to the previously created Seurat object
    brca.cite_wnn[["ADT"]] <- adt_assay
    
    #### Pre-processing and dimensional reduction on both assays independently.
    DefaultAssay(brca.cite_wnn) <- 'RNA'
    brca.cite_wnn <- NormalizeData(brca.cite_wnn) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA() 
    
    DefaultAssay(brca.cite_wnn) <- 'ADT'
    VariableFeatures(brca.cite_wnn) <- rownames(brca.cite_wnn[["ADT"]])
    brca.cite_wnn <- NormalizeData(brca.cite_wnn, normalization.method = 'CLR', margin = 2) %>% 
      ScaleData() %>% RunPCA(reduction.name = 'apca')
    
    ### Identify multimodal neighbors
    brca.cite_wnn <- FindMultiModalNeighbors(
      brca.cite_wnn, reduction.list = list("pca", "apca"),
      dims.list = list(1:30, 1:20), modality.weight.name = "RNA.weight"
    )
    ### Clustering cells
    brca.cite_wnn <- RunUMAP(brca.cite_wnn, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
    #brca.cite_wnn <- FindClusters(brca.cite_wnn, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
    p1 <- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'orig.ident') + NoLegend()
    p2 <- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'celltype_major', cols =cols_major,label = T,label.size = 5,repel = T) #+ NoLegend()
    p1 + p2
    
    p5 <- FeaturePlot(brca.cite_wnn, c("adt_CD45",'adt_CD31','adt_CD19',"adt_MCAM"),slot='data', cols = viridis(15),ncol = 4)
    p6 <- FeaturePlot(brca.cite_wnn, c("rna_PTPRC",'rna_PECAM1','rna_CD19',"rna_MCAM"),slot='data', cols =inferno(15),ncol = 4)
    p5 /p6
    
    ###Compareing the UMAP visualization based on only the RNA, protein data and both of them
    brca.cite_wnn <- RunUMAP(brca.cite_wnn, reduction = 'pca', dims = 1:30, assay = 'RNA', 
                             reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')
    brca.cite_wnn <- RunUMAP(brca.cite_wnn, reduction = 'apca', dims = 1:20, assay = 'ADT', 
                             reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')
    
    p3 <- DimPlot(brca.cite_wnn, reduction = 'rna.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("RNA") +theme(plot.title = element_text(hjust = 0.5)) + NoLegend()
    p4 <- DimPlot(brca.cite_wnn, reduction = 'adt.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("ADT") +theme(plot.title = element_text(hjust = 0.5)) + NoLegend()
    p5<- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("WNN") +theme(plot.title = element_text(hjust = 0.5),                                                                                                                                  legend.position = "bottom") #+ NoLegend()
    p5+p3+p4
    

    再来比较一下在细分时,两种方法的效果:

    用经过WNN整合后的数据比仅用mRNA数据识别细胞类型效果要好

    Reference:

    [1].https://satijalab.org/seurat/articles/multimodal_vignette.html

    [2].https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html

    [3].https://www.jianshu.com/p/ceee6aa17335

    欢迎关注同名公众号SciNote

    相关文章

      网友评论

          本文标题:【单细胞组学】细胞表面蛋白辅助细胞分群

          本文链接:https://www.haomeiwen.com/subject/vrvrwrtx.html