美文网首页单细胞测序专题集合
单细胞转录组数据分析|| pypairs教程:细胞周期推断

单细胞转录组数据分析|| pypairs教程:细胞周期推断

作者: 周运来就是我 | 来源:发表于2020-03-31 00:40 被阅读0次

    单细胞转录组技术是来解释细胞状态的有力武器,在分群得到不同细胞类型的划分之前,有时候需要消除细胞周期的影响。毕竟同样的细胞类型也有着不同的状态,细胞周期就是状态之一。

    在R中有scran以及seurat包均可推断以及回归掉细胞周期影响的函数。各方的引用几本来自这篇文献: Scialdone et al. (2015),感兴趣的同学可以学习一下。

    在scran中已经附带了人和小鼠细胞周期相关的基因:

    
    library(scater)
    
    # Getting pre-trained marker sets
    mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
    head(mm.pairs$G1)
                  first             second
    1 ENSMUSG00000000001 ENSMUSG00000001785
    2 ENSMUSG00000000001 ENSMUSG00000005470
    3 ENSMUSG00000000001 ENSMUSG00000012443
    4 ENSMUSG00000000001 ENSMUSG00000015120
    5 ENSMUSG00000000001 ENSMUSG00000022033
    6 ENSMUSG00000000001 ENSMUSG00000023015
    
    
    hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
    head(hs.pairs$G2M)
    
               first          second
    1 ENSG00000100519 ENSG00000146457
    2 ENSG00000100519 ENSG00000132646
    3 ENSG00000100519 ENSG00000124171
    4 ENSG00000123975 ENSG00000100519
    5 ENSG00000126787 ENSG00000100519
    6 ENSG00000161057 ENSG00000100519
    

    推测细胞周期的状态(G1,S,G2M)是根据成对的基因表达变化来的。

    而 seurat中是根据细胞周期基因来推断的:

     Seurat::cc.genes.updated.2019
    $s.genes
     [1] "MCM5"     "PCNA"     "TYMS"     "FEN1"     "MCM7"     "MCM4"     "RRM1"     "UNG"      "GINS2"    "MCM6"     "CDCA7"   
    [12] "DTL"      "PRIM1"    "UHRF1"    "CENPU"    "HELLS"    "RFC2"     "POLR1B"   "NASP"     "RAD51AP1" "GMNN"     "WDR76"   
    [23] "SLBP"     "CCNE2"    "UBR7"     "POLD3"    "MSH2"     "ATAD2"    "RAD51"    "RRM2"     "CDC45"    "CDC6"     "EXO1"    
    [34] "TIPIN"    "DSCC1"    "BLM"      "CASP8AP2" "USP1"     "CLSPN"    "POLA1"    "CHAF1B"   "MRPL36"   "E2F8"    
    
    $g2m.genes
     [1] "HMGB2"   "CDK1"    "NUSAP1"  "UBE2C"   "BIRC5"   "TPX2"    "TOP2A"   "NDC80"   "CKS2"    "NUF2"    "CKS1B"   "MKI67"  
    [13] "TMPO"    "CENPF"   "TACC3"   "PIMREG"  "SMC4"    "CCNB2"   "CKAP2L"  "CKAP2"   "AURKB"   "BUB1"    "KIF11"   "ANP32E" 
    [25] "TUBB4B"  "GTSE1"   "KIF20B"  "HJURP"   "CDCA3"   "JPT1"    "CDC20"   "TTK"     "CDC25C"  "KIF2C"   "RANGAP1" "NCAPD2" 
    [37] "DLGAP5"  "CDCA2"   "CDCA8"   "ECT2"    "KIF23"   "HMMR"    "AURKA"   "PSRC1"   "ANLN"    "LBR"     "CKAP5"   "CENPE"  
    [49] "CTCF"    "NEK2"    "G2E3"    "GAS2L3"  "CBX5"    "CENPA"  
    

    那,今天和大家介绍一种类似scran的在python中实现的方法:pypairs

    PyPairs - A python scRNA-Seq classifier

    # -*- coding: utf-8 -*-
    """
    Created on Sat Mar 28 22:45:55 2020
    # https://pypairs.readthedocs.io/en/latest/welcome.html
    @author: Administrator
    """
    
    from pypairs import pairs, datasets
    
    # Load samples from the oscope scRNA-Seq dataset with known cell cycle
    training_data = datasets.leng15(mode='sorted')
    
    training_data.obs
    Out[155]: 
                category
    G2_Exp1.059      G2M
    G2_Exp1.069      G2M
    G2_Exp1.075      G2M
    G2_Exp1.063      G2M
    G2_Exp1.029      G2M
                 ...
    G1_Exp1.063       G1
    G1_Exp1.083       G1
    G1_Exp1.030       G1
    G1_Exp1.018       G1
    G1_Exp1.046       G1
    
     help(datasets.leng15)
    Help on function leng15 in module pypairs.datasets:
    
    leng15(mode: Union[str, NoneType] = 'all', gene_sub: Union[Iterable[int], NoneType] = None, sample_sub: Union[Iterable[int], NoneType] = None) -> anndata._core.anndata.AnnData
        Single cell RNA-seq data of human hESCs to evaluate Oscope [Leng15]_
        
        Total 213 H1 single cells and 247 H1-Fucci single cells were sequenced.
        The 213 H1 cells were used to evaluate Oscope in identifying oscillatory genes.
        The H1-Fucci cells were used to confirm the cell cycle gene cluster identified
        by Oscope in the H1 hESCs.
        Normalized expected counts are provided in GSE64016_H1andFUCCI_normalized_EC.csv.gz
        
        Reference
        ---------
            GEO-Dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64016
        
        Parameters
        ----------
        mode
            sample selection mode:
                - 'all' for all samples, default
                - 'sorted' for all samples with known cell cycle (G2, S or G1)
                - 'unsorted' for all samples with unknown cell cycle (H1)
        gene_sub
            Index based array of subsetted genes
        sample_sub
            Index based array of subsetted samples
        
        Returns
        -------
        adata : :class:`~anndata.AnnData`
            Annotated data matrix containing the normalized gene counts
    

    获得细胞周期相关基因对,类似scran的sandbag:

    # Run sandbag() to identify marker pairs
    marker_pairs = pairs.sandbag(training_data, fraction=0.3)
    marker_pairs['G2M'][1:10]
    Out[166]: 
    [('KAT5', 'SYNJ2BP-COX16'),
     ('USF1', 'KIAA0040'),
     ('CAMK2N1', 'ST3GAL6'),
     ('CAMK2N1', 'TMEM108'),
     ('CAMK2N1', 'PAK3'),
     ('CAMK2N1', 'ZNF789'),
     ('CAMK2N1', 'NBPF11'),
     ('CAMK2N1', 'RHOBTB1'),
     ('CAMK2N1', 'KCNG3')]
    

    读入我们的数据

    import scanpy as sc 
    results_file = 'E:\learnscanpy\write\pbmc3k.h5ad'
    sdata = sc.read_h5ad(results_file)
    sdata
    
    AnnData object with n_obs × n_vars = 2223 × 2208 
        obs: 'n_genes', 'percent_mito', 'n_counts', 'percent_mito1', 'leiden'
        var: 'gene_ids', 'feature_types', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
        uns: 'leiden', 'leiden_colors', 'leiden_sizes', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap'
        obsm: 'X_pca', 'X_umap'
        varm: 'PCs'
    
    sdata.obs
    
                        n_genes  percent_mito  ...  percent_mito1          leiden
    AAACGCTGTGTGATGG-1     2348      0.099821  ...     998.205811     NewCluster3
    AAACGCTTCTAGGCAT-1     2470      0.065564  ...     655.640869       Dendritic
    AAAGAACCACCGTCTT-1      420      0.021944  ...     219.435730              NK
    AAAGAACTCACCTGGG-1     2479      0.074067  ...     740.672119               B
    AAAGAACTCGGAACTT-1     2318      0.079713  ...     797.133240       Dendritic
                        ...           ...  ...            ...             ...
    TTTGGTTTCCGGTAAT-1     2277      0.079324  ...     793.240723           CD8 T
    TTTGTTGCAGGTACGA-1     1986      0.071226  ...     712.257751  CD14 Monocytes
    TTTGTTGCAGTCTCTC-1     2485      0.055394  ...     553.935852           CD8 T
    TTTGTTGTCCTTGGAA-1     1998      0.081037  ...     810.366882  Megakaryocytes
    TTTGTTGTCGCACGAC-1     2468      0.070987  ...     709.868225           CD8 T
    
    [2223 rows x 5 columns]
    

    进行预测:

    sresult  = pairs.cyclone(sdata, marker_pairs)
    print(sresult)
    
                          G2M      S     G1 max_class cc_prediction
    AAACGCTGTGTGATGG-1  0.032  0.986  0.701         S            G1
    AAACGCTTCTAGGCAT-1  0.848  0.970  0.543         S           G2M
    AAAGAACCACCGTCTT-1  0.320  0.467  0.855        G1            G1
    AAAGAACTCACCTGGG-1  0.190  0.846  0.288         S             S
    AAAGAACTCGGAACTT-1  0.358  0.723  0.871        G1            G1
                      ...    ...    ...       ...           ...
    TTTGGTTTCCGGTAAT-1  0.451  0.689  0.392         S             S
    TTTGTTGCAGGTACGA-1  0.680  0.410  0.684        G1            G1
    TTTGTTGCAGTCTCTC-1  0.544  0.763  0.278         S           G2M
    TTTGTTGTCCTTGGAA-1  0.033  0.924  0.936        G1            G1
    TTTGTTGTCGCACGAC-1  0.908  0.748  0.745       G2M           G2M
    
    [2223 rows x 5 columns]
    

    在umap空间可视化一下:

    sc.tl.umap(sdata)
     sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])
    

    在scanpy中消除每种细胞周期得分的影响:

    sc.pp.regress_out(sdata, ['n_counts', 'percent_mito','pypairs_G2M','pypairs_G1','pypairs_G1'])
    sc.pp.scale(sdata, max_value=10)
    sc.tl.pca(sdata, svd_solver='arpack')
    sc.pp.neighbors(sdata, n_neighbors=10, n_pcs=40)
    sc.tl.leiden(sdata)
    sc.tl.umap(sdata)
    sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])
    

    PyPairs - A python scRNA-Seq classifier
    This is a python-reimplementation of the Pairs algorithm as described by A. Scialdone et. al. (2015). Original Paper available under: <https://doi.org/10.1016/j.ymeth.2015.06.021>

    相关文章

      网友评论

        本文标题:单细胞转录组数据分析|| pypairs教程:细胞周期推断

        本文链接:https://www.haomeiwen.com/subject/hsabuhtx.html