美文网首页
10X单细胞转录组多物种联合分析:以人和小鼠为例

10X单细胞转录组多物种联合分析:以人和小鼠为例

作者: 夕颜00 | 来源:发表于2021-05-11 18:14 被阅读0次

    10X单细胞转录组是可以联合分析多个物种的。也许,你会问为什么要分析多个物种,我一个物种的某个器官还整明白呢?其实在做:

    • 模型
    • 移植
    • 免疫
    • 病毒
    • 生殖隔离

    的时候,同一个组织的细胞并不是同一个来源的,或者同一个细胞的mRNA本身有不同的来源(如病毒侵染的细胞)。这就需要确定不同来源的比例及其演化关系。就拿最近的新冠肺炎来说吧,在单细胞水平上做的一个关键就是区分细胞内宿主和病毒基因的表达。

    今天我们以10X平台下人和小鼠为例介绍一下多物种联合分析。
    本次为人肿瘤移植到裸鼠上。

    参考基因组

    众所周知,在单一物种中我们是基于比对来给UMI定量的。那么,多个物种中的一个关键就是构建这样一个多物种的基因组,用于定量。也许我们要看一下基因组到底是什么了:

    • fa序列
    • 注释信息

    其实基因组主要的组成就这两部分,当然又很所指标来衡量,比如组装的水平啊,注释的是否详细啊。每个基因组都有这两个文件,质言之,我们按照某种规则把它们分别合并到一起就可以了。10X给出了规范程序,构建的基因组可以直接用于cellranger count

    cellranger mkref --genome=hg19 --fasta=hg19.fa --genes=hg19-filtered-ensembl.gtf \
                       --genome=mm10 --fasta=mm10.fa --genes=mm10-filtered-ensembl.gtf
    
    

    详细的描述在这了: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#multi

    这也说明基因组作为基础研究的重要性。

    cellranger 结果

    配置好基因组文件后面的分析就可以跑cellranger了,结果文件是这样:

    序列的基本信息:

    image image

    比对信息:

    image

    样本和GEM信息:

    image
    在seurat中分析

    读入数据

    library(Seurat)
    library(tidyverse)
    
    pbmc <-Read10X("filtered_feature_bc_matrix")
    pbmc<- CreateSeuratObject(pbmc)
    pbmc
    
    An object of class Seurat 
    112137 features across 1230 samples within 1 assay 
    Active assay: RNA (112137 features)
    
    

    查看数据

    > tail(rownames(pbmc),3)
    [1] "mm10-CAAA01064564.1" "mm10-Vmn2r122"       "mm10-CAAA01147332.1"
    > head(rownames(pbmc),3)
    [1] "hg38-MIR1302-2HG" "hg38-FAM138A"     "hg38-OR4F5" 
    > head(pbmc@meta.data)
                        orig.ident nCount_RNA nFeature_RNA
    AAACCCAGTTGAAGTA SeuratProject       5531         2160
    AAACCCATCGAGCTGC SeuratProject      42194         7249
    AAACGAAAGTTCCGGC SeuratProject      24685         4740
    AAACGAACACTTTATC SeuratProject      48421         7164
    AAACGAACATCAGTGT SeuratProject      60559         8136
    AAAGGATTCTTCGATT SeuratProject       6478         1269
    
    

    可以看出基因名前面区分了人和小鼠,但是细胞并没有区分出来,因为前面我们说过,有的细胞中可能是有不同物种来源的基因的。

    计算每个细胞人和小鼠基因的占比

    > pbmc[["prop_mouse"]] <- PercentageFeatureSet(pbmc, pattern = "^mm10-")
    > pbmc[["prop_human"]] <- PercentageFeatureSet(pbmc, pattern = "^hg38-")
    > head(pbmc@meta.data)
                          orig.ident nCount_RNA nFeature_RNA prop_mouse prop_human
    AAACCCAAGCAGATAT-1 SeuratProject       7148         2146 98.1533296  1.8466704
    AAACCCAAGCGAGTAC-1 SeuratProject      27556         4670  0.4463638 99.5536362
    AAACCCAAGCGATCGA-1 SeuratProject      25409         4969  0.4722736 99.5277264
    AAACCCAAGCTCGAAG-1 SeuratProject      20976         4371 99.6615179  0.3384821
    AAACCCAAGCTTAGTC-1 SeuratProject      17805         4020 99.5450716  0.4549284
    AAACCCAAGTCGAGGT-1 SeuratProject      36289         5699  0.3665022 99.6334978
    
    

    用小提琴可视化一下:

    VlnPlot(pbmc,features =c("prop_human","prop_mouse"))
    
    
    image.png

    在这例样本中人和小鼠的区分还是比较明显的prop_mouse和prop_human都要么在100%要么在0%,所以我们可以按照每个细胞中人和小鼠基因的表达情况来划分细胞。

    cell_species<- pbmc@meta.data %>% 
      mutate(species = case_when(
        prop_mouse > 0.7 ~ "mouse",
        prop_human > 0.7 ~ "human",
        TRUE ~ "mixed"
      ))
    
    pbmc <- AddMetaData(pbmc, metadata = cell_species$species, col.name = "species")
    head(pbmc@meta.data)
    
    head(pbmc@meta.data)
                         orig.ident nCount_RNA nFeature_RNA prop_mouse prop_human species
    AAACCCAAGCAGATAT-1 SeuratProject       7148         2146 98.1533296  1.8466704   mouse
    AAACCCAAGCGAGTAC-1 SeuratProject      27556         4670  0.4463638 99.5536362   human
    AAACCCAAGCGATCGA-1 SeuratProject      25409         4969  0.4722736 99.5277264   human
    AAACCCAAGCTCGAAG-1 SeuratProject      20976         4371 99.6615179  0.3384821   mouse
    AAACCCAAGCTTAGTC-1 SeuratProject      17805         4020 99.5450716  0.4549284   mouse
    AAACCCAAGTCGAGGT-1 SeuratProject      36289         5699  0.3665022 99.6334978   human
    
    

    seurat标准分析

    pbmc <- pbmc %>%
      NormalizeData() %>% 
      FindVariableFeatures() %>% 
      ScaleData() %>% 
      RunPCA(dims = 1:30) %>%
      RunTSNE(dims = 1:20, check_duplicates = FALSE)%>%
      RunUMAP(dims = 1:20, check_duplicates = FALSE)
    
    
    DimPlot(pbmc, reduction = "umap", pt.size = 0.5, group.by = "species")
    
    
    image.png
    差异分析
    hmark <-FindMarkers(pbmc,ident.1 ="mouse", group.by ="species" )
      |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04m 56s
    head(hmark )
    
                   p_val avg_logFC pct.1 pct.2 p_val_adj
    mm10-Malat1      0  4.750361 0.792 0.333         0
    mm10-Fth1        0  4.537025 0.924 0.885         0
    mm10-mt-Atp6     0  4.478014 0.777 0.393         0
    mm10-mt-Co3      0  4.471524 0.772 0.406         0
    mm10-Apoe        0  4.355646 0.894 0.937         0
    mm10-mt-Co2      0  4.226476 0.751 0.224         0
    
    
    FeaturePlot(pbmc,features = c("hg38-NUMB","mm10-Snf8"))
    
    
    image.png
    分群
    pbmc <- FindNeighbors(pbmc, dims = 1:10)
    pbmc<-FindClusters(pbmc)
    DimPlot(pbmc, reduction = "umap", pt.size = 0.5)
    
    
    image.png
    整合
    > species.list <- SplitObject(pbmc, split.by = "species")
    > species.list
    $mouse
    An object of class Seurat 
    112137 features across 736 samples within 1 assay 
    Active assay: RNA (112137 features)
     3 dimensional reductions calculated: pca, tsne, umap
    
    $human
    An object of class Seurat 
    112137 features across 494 samples within 1 assay 
    Active assay: RNA (112137 features)
     3 dimensional reductions calculated: pca, tsne, umap
    
    
    for (i in names(species.list)) {
      species.list[[i]] <- SCTransform(species.list[[i]], verbose = FALSE)
    }
    species.features <- SelectIntegrationFeatures(object.list = species.list, nfeatures = 3000)
    species.list <- PrepSCTIntegration(object.list = species.list, anchor.features = species.features)
    species.anchors <- FindIntegrationAnchors(object.list = species.list, normalization.method = "SCT", 
                                           anchor.features = species.features)
    species.integrated <- IntegrateData(anchorset = species.anchors, normalization.method = "SCT")
    
    species.integrated <- RunPCA(object = species.integrated, verbose = FALSE)
    species.integrated <- RunUMAP(object = species.integrated, dims = 1:30)
    head(species.integrated@meta.data)
    DimPlot(species.integrated, group.by = c("species"), combine = FALSE)
    
    
    image
    下游分析

    其实得到这个表达谱之后,后面的受配体分析、亚群分析等都可以。

    mixing-mouse-and-human-10x-single-cell-rnaseq-data

    转载来自:
    作者:周运来就是我
    链接:https://www.jianshu.com/p/31039b6af429
    来源:简书
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

    相关文章

      网友评论

          本文标题:10X单细胞转录组多物种联合分析:以人和小鼠为例

          本文链接:https://www.haomeiwen.com/subject/vndndltx.html