美文网首页单细胞免疫组库
免疫组库数据分析||immunarch教程:探索性数据分析

免疫组库数据分析||immunarch教程:探索性数据分析

作者: 周运来就是我 | 来源:发表于2020-08-17 21:09 被阅读0次

    immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

    我们已经讲到,immunarch分析免疫主库数据是很有好的,那么有多友好呢?真的可以5行代码出博士论文吗?我们来看看这五行代码:

    install.packages("immunarch")           # Install the package
    library(immunarch); data(immdata)       # Load the package and the test dataset
    repOverlap(immdata$data) %>% vis()      # Compute and visualise the most important statistics:
    geneUsage(immdata$data[[1]]) %>% vis()  #     public clonotypes, gene usage, sample diversity
    repDiversity(immdata$data) %>% vis(.by = "Status", .meta = immdata$meta)      # Group samples
    

    为了系统地了解这个包,我们引入一个包:pacman。

    library(pacman)
    p_functions(immunarch)
    
    [1] "apply_asymm"                 
     [2] "apply_symm"                  
     [3] "bunch_translate"             
     [4] "coding"                      
     [5] "cross_entropy"               
     [6] "dbAnnotate"                  
     [7] "dbLoad"                      
     [8] "entropy"                     
     [9] "fixVis"                      
    [10] "gene_stats"                  
    [11] "geneUsage"                   
    [12] "geneUsageAnalysis"           
    [13] "get_genes"                   
    [14] "getKmers"                    
    [15] "immunr_dbscan"               
    [16] "immunr_hclust"               
    [17] "immunr_kmeans"               
    [18] "immunr_mds"                  
    [19] "immunr_pca"                  
    [20] "immunr_tsne"                 
    [21] "inc_overlap"                 
    [22] "inframes"                    
    [23] "js_div"                      
    [24] "kl_div"                      
    [25] "kmer_profile"                
    [26] "noncoding"                   
    [27] "outofframes"                 
    [28] "public_matrix"               
    [29] "publicRepertoire"            
    [30] "publicRepertoireApply"       
    [31] "publicRepertoireFilter"      
    [32] "pubRep"                      
    [33] "pubRepApply"                 
    [34] "pubRepFilter"                
    [35] "pubRepStatistics"            
    [36] "repClonality"                
    [37] "repDiversity"                
    [38] "repExplore"                  
    [39] "repLoad"                     
    [40] "repOverlap"                  
    [41] "repOverlapAnalysis"          
    [42] "repSample"                   
    [43] "repSave"                     
    [44] "select_barcodes"             
    [45] "select_clusters"             
    [46] "spectratype"                 
    [47] "split_to_kmers"              
    [48] "top"                         
    [49] "trackClonotypes"             
    [50] "vis"                         
    [51] "vis_bar"                     
    [52] "vis_box"                     
    [53] "vis_circos"                  
    [54] "vis_heatmap"                 
    [55] "vis_heatmap2"                
    [56] "vis_hist"                    
    [57] "vis_immunr_kmer_profile_main"
    [58] "vis_seqlogo"                 
    [59] "vis_textlogo"             
    

    也是就是这个R包有59个函数(功能),但是我们并不需要全部记住,常用的有:

    • repExplore- to compute basic statistics, such as number of clones or distributions of lengths and counts. To explore them you need to pass the statistics, e.g. count, to the .method.
    • repClonality - to compute the clonality of repertoires.
    • repOverlap - to compute the repertoire overlap.
    • repOverlapAnalysis - to analyse the repertoire overlap, including different clustering procedures and PCA.
    • geneUsage - to compute the distributions of V or J genes.
    • geneUsageAnalysis - to analyse the distributions of V or J genes, including clustering and PCA.
    • repDiversity - to estimate the diversity of repertoires.
    • trackClonotypes - to analyse the dynamics of repertoires across time points.
    • spectratype - to compute spectratype of clonotypes.
    • getKmers and kmer_profile - to compute distributions of kmers and sequence profiles

    在immunarch中统计只是一条命令,而可视化半条命令就够了。每个分析函数的输出可以直接传递到vis函数:用于可视化的一般函数。下面是用法示例。几乎所有可视化的分析结果都支持根据元数据表中的各自属性或使用用户提供的属性对数据进行分组。分组可以通过传递.by参数或同时传递.by.meta参数给vis函数来实现。

    library(immunarch)  # Load the package into R
    data(immdata)  # Load the test dataset
    immdata$meta
    exp_vol <- repExplore(immdata$data, .method = "volume")
    p1 <- vis(exp_vol, .by = c("Status"), .meta = immdata$meta)
    p2 <- vis(exp_vol, .by = c("Status", "Sex"), .meta = immdata$meta)
    p1 + p2
    

    同样,您可以将.by作为一个字符向量传递,它与数据中的样本数量精确匹配,每个值都应该对应于样本的属性。它将用于根据所提供的值对数据进行分组。注意,在这种情况下,您应该将NA传递给.meta。

    exp_vol <- repExplore(immdata$data, .method = "volume")
    by_vec <- c("C", "C", "C", "C", "C", "C", "MS", "MS", "MS", "MS", "MS", "MS")
    p <- vis(exp_vol, .by = by_vec)
    p
    

    你要说,哎,我想提出数据自己画图。

    exp_vol <- repExplore(immdata$data, .method = "volume")
    
    exp_vol
             Sample Volume
    A2-i129 A2-i129   6443
    A2-i131 A2-i131   6375
    A2-i133 A2-i133   6300
    A2-i132 A2-i132   6721
    A4-i191 A4-i191   5058
    A4-i192 A4-i192   5763
    MS1         MS1   5301
    MS2         MS2   7043
    MS3         MS3   6310
    MS4         MS4   7313
    MS5         MS5   5588
    MS6         MS6   7278
    

    如果数据是分组的,将执行比较组均值的统计检验,除非提供.test = F。在只有两组的情况下,采用Wilcoxon秩和检验(R函数wilcox)。exact = F)参数进行测试,以测试两组之间是否存在平均秩值的差异。当大于两组时,采用Kruskal-Wallis检验(R函数kruskar .test),相当于秩次方差分析(ANOVA),检验来自不同组的样本是否来自同一分布。显著的Kruskal-Wallis检验表明,至少一个样本随机地占优势于另一个样本。经过多次比较调整后的p值绘制在组的顶部。p值的调整使用Holm方法(也称为Holm- bonferroni校正)。您可以执行命令?p.adjust在R控制台看到更多。

    如果对默认出的图不满意,你可以用fixVis打开一个shiny窗口,一点一点调整直到可以发表。

    # 1. Analyse
    exp_len <- repExplore(immdata$data, .method = "len", .col = "aa")
    # 2. Visualise
    p1 <- vis(exp_len)
    # 3. Fix and make publication-ready results
    fixVis(p1)
    

    对于基本的探索性分析,比如比较每个指repertoire or distribution的reads/ UMIs数量,可以使用repExplore函数。

    exp_len <- repExplore(immdata$data, .method = "len", .col = "aa")
    exp_cnt <- repExplore(immdata$data, .method = "count")
    exp_vol <- repExplore(immdata$data, .method = "volume")
    
    p1 <- vis(exp_len)
    p2 <- vis(exp_cnt)
    p3 <- vis(exp_vol)
    
    p1
    
    p2 + p3
    

    加入分组信息即可得到统计结果。

    # You can group samples by their metainformation
    p4 <- vis(exp_len, .by = "Status", .meta = immdata$meta)
    p5 <- vis(exp_cnt, .by = "Sex", .meta = immdata$meta)
    p6 <- vis(exp_vol, .by = c("Status", "Sex"), .meta = immdata$meta)
    
    p4
    
    p5 + p6
    

    评价样本多样性的方法之一是评价克隆性(Clonality,主要区别于clonotypes)。repClonality 是指最常见或最不常见的克隆性的数量。有几种方法来评估克隆性,让我们来看看它们。clonal.prop方法计算细胞克隆池占用曲目的比例:

    imm_pr <- repClonality(immdata$data, .method = "clonal.prop")
    imm_pr
    
    
           Clones Percentage Clonal.count.prop
    A2-i129     18       10.0      0.0027556644
    A2-i131     28       10.0      0.0042728521
    A2-i133      9       10.3      0.0014077898
    A2-i132    113       10.0      0.0164987589
    A4-i191      4       11.5      0.0007773028
    A4-i192      8       10.4      0.0013738623
    MS1          2       10.1      0.0003700278
    MS2         66       10.0      0.0092372288
    MS3          2       10.6      0.0003095496
    MS4        176       10.0      0.0236336780
    MS5          3       13.2      0.0005303164
    MS6        150       10.0      0.0202456472
    attr(,"class")
    [1] "immunr_clonal_prop" "matrix"            
    

    第一种方法考虑的是最丰富的细胞克隆型:

    imm_top <- repClonality(immdata$data, .method = "top", .head = c(10, 100, 1000, 3000, 10000))
    imm_top
                    10        100      1000      3000 10000
    A2-i129 0.08011765 0.17282353 0.3491765 0.5844706     1
    A2-i131 0.06670588 0.15647059 0.3467059 0.5820000     1
    A2-i133 0.10505882 0.18294118 0.3655294 0.6008235     1
    A2-i132 0.02388235 0.09423529 0.3118824 0.5471765     1
    A4-i191 0.17176471 0.32047059 0.5122353 0.7475294     1
    A4-i192 0.11541176 0.22141176 0.4325882 0.6678824     1
    MS1     0.19164706 0.30894118 0.4817647 0.7170588     1
    MS2     0.04458824 0.11470588 0.2770588 0.5123529     1
    MS3     0.16482353 0.23011765 0.3575294 0.5928235     1
    MS4     0.02329412 0.07517647 0.2415294 0.4768235     1
    MS5     0.20611765 0.29188235 0.4521176 0.6874118     1
    MS6     0.02835294 0.08235294 0.2460000 0.4812941     1
    attr(,"class")
    [1] "immunr_top_prop" "matrix"       
    
    imm_top %>% vis()
    

    而稀有(rare )方法处理的是最不多产的克隆型:

    imm_rare <- repClonality(immdata$data, .method = "rare")
    imm_rare
    
    imm_rare
                    1         3        10        30       100 MAX
    A2-i129 0.6991765 0.8215294 0.8710588 0.9267059 0.9604706   1
    A2-i131 0.6969412 0.8243529 0.8995294 0.9436471 0.9869412   1
    A2-i133 0.6800000 0.8100000 0.8690588 0.8974118 0.9357647   1
    A2-i132 0.7088235 0.8831765 0.9643529 0.9951765 1.0000000   1
    A4-i191 0.5355294 0.6484706 0.7189412 0.7707059 0.8697647   1
    A4-i192 0.5976471 0.7487059 0.8342353 0.8675294 0.9156471   1
    MS1     0.5780000 0.6692941 0.7438824 0.7929412 0.8571765   1
    MS2     0.7788235 0.8891765 0.9322353 0.9718824 1.0000000   1
    MS3     0.7272941 0.7825882 0.8071765 0.8385882 0.8652941   1
    MS4     0.8109412 0.9343529 0.9725882 1.0000000 1.0000000   1
    MS5     0.6112941 0.7001176 0.7575294 0.7809412 0.8284706   1
    MS6     0.8107059 0.9208235 0.9703529 0.9897647 1.0000000   1
    

    最后,用homeo方法评估克隆空间的稳态,即。,给定大小的无性系所占曲目的比例

    imm_hom <- repClonality(immdata$data,
                            .method = "homeo",
                            .clone.types = c(Small = .0001, Medium = .001, Large = .01, Hyperexpanded = 1)
    )
    imm_hom
    
           Small (0 < X <= 1e-04) Medium (1e-04 < X <= 0.001)
    A2-i129                      0                   0.8634118
    A2-i131                      0                   0.8858824
    A2-i133                      0                   0.8597647
    A2-i132                      0                   0.9542353
    A4-i191                      0                   0.7135294
    A4-i192                      0                   0.8183529
    MS1                          0                   0.7248235
    MS2                          0                   0.9244706
    MS3                          0                   0.8061176
    MS4                          0                   0.9683529
    MS5                          0                   0.7520000
    MS6                          0                   0.9625882
            Large (0.001 < X <= 0.01)
    A2-i129                0.09705882
    A2-i131                0.09011765
    A2-i133                0.07600000
    A2-i132                0.04576471
    A4-i191                0.15623529
    A4-i192                0.08611765
    MS1                    0.12082353
    MS2                    0.06411765
    MS3                    0.05917647
    MS4                    0.03164706
    MS5                    0.07647059
    MS6                    0.03741176
            Hyperexpanded (0.01 < X <= 1)
    A2-i129                    0.03952941
    A2-i131                    0.02400000
    A2-i133                    0.06423529
    A2-i132                    0.00000000
    A4-i191                    0.13023529
    A4-i192                    0.09552941
    MS1                        0.15435294
    MS2                        0.01141176
    MS3                        0.13470588
    MS4                        0.00000000
    MS5                        0.17152941
    MS6                        0.00000000
    attr(,"class")
    [1] "immunr_homeo" "matrix"      
    
    vis(imm_top) + vis(imm_top, .by = "Status", .meta = immdata$meta)
    
    
    vis(imm_rare) + vis(imm_rare, .by = "Status", .meta = immdata$meta)
    
    
    vis(imm_hom) + vis(imm_hom, .by = c("Status", "Sex"), .meta = immdata$meta)
    
    

    到这里,我们已经写完一篇博士论文了。

    参考:

    https://immunarch.com/articles/web_only/v3_basic_analysis.html

    相关文章

      网友评论

        本文标题:免疫组库数据分析||immunarch教程:探索性数据分析

        本文链接:https://www.haomeiwen.com/subject/hbfcjktx.html