美文网首页单细胞单细胞测序试读
使用PHATE复现Science Immunology文章的结果

使用PHATE复现Science Immunology文章的结果

作者: Davey1220 | 来源:发表于2021-10-28 08:55 被阅读0次

    在上篇文章中,我们初步探索了PHATE的使用方法,发现它在揭示一些连续分化过程中不同细胞状态之间的微小局部差异具有很好的效果,同时也能保留细胞全局的整体结构。在本节教程中,我们将复现演示近期发表在Science Immunology期刊上的一篇文章的结果,进一步学习PHATE的相关使用方法。

    image.png

    原文链接:A reservoir of stem-like CD8+ T cells in the tumor-draining lymph node preserves the ongoing anti-tumor immune response. 2021, Science Immunology

    文章结果图:


    image.png
    image.png

    本文复现图:


    image.png image.png

    文章数据下载

    文章处理后的基因表达矩阵文件存放在GEO数据库中,检索号为 GSE182509,这里只提供了pkl格式的表达矩阵,需要用python的pickle包进行读取,我已将其转换为TSV文件存放在我的百度云盘中,有需要的可以下载使用。

    链接:https://pan.baidu.com/s/1IoSIYoEfTzZarLWXWvgvzg
    提取码:gkd9

    image.png

    加载示例数据

    # 导入所需的python包
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import phate
    import scprep
    import magic
    
    # 加载示例数据
    chronic = scprep.io.load_tsv("GSM5530565_Chronic_LCMV_processed.matrix.txt")
    acute = scprep.io.load_tsv("GSM5530564_Acute_LCMV_processed.matrix.txt")
    tumor8 = scprep.io.load_tsv("GSM5530560_Lung_week8_processed.matrix.txt")
    dln8 = scprep.io.load_tsv("GSM5530561_LN_week8_processed.matrix.txt")
    tumor17 = scprep.io.load_tsv("GSM5530562_Lung_week17_processed.matrix.txt")
    dln17 = scprep.io.load_tsv("GSM5530563_LN_week17_processed.matrix.txt")
    
    # 查看示例数据
    chronic.head()
    #                    Mrpl15 (ENSMUSG00000033845)  ...  CAAA01147332.1 (ENSMUSG00000095742)
    #0                                                ...
    #AAACCTGAGAGTAAGG-1                          0.0  ...                             0.000000
    #AAACCTGGTGTGAAAT-1                          0.0  ...                             0.000000
    #AAACGGGAGGCTCATT-1                          0.0  ...                             0.000000
    #AAACGGGGTAGCTTGT-1                          0.0  ...                             0.000000
    #AAACGGGGTCGAATCT-1                          0.0  ...                             0.552073
    #[5 rows x 11595 columns]
    chronic.shape
    #(1185, 11595)
    acute.shape
    #(10768, 12960)
    tumor8.shape
    #(806, 11749)
    dln8.shape
    #(1742, 12116)
    tumor17.shape
    #(731, 11150)
    dln17.shape
    #(876, 11595)
    

    数据质控预处理和数据归一化

    由于下载的数据是预先已经进行质控过滤和归一化处理的,这里将不再进行处理。详细用法见上期 使用PHATE进行单细胞高维数据的可视化

    使用PHATE进行低维嵌入降维可视化

    ### analysis for chronic sample ###
    #Embedding Data Using PHATE
    # Instantiating the PHATE estimator
    phate_operator = phate.PHATE(n_jobs=-2)
    
    Y_phate = phate_operator.fit_transform(chronic)
    Y_phate
    #array([[ -5.34563999,   6.11670333],
    #       [ 29.8771147 ,   8.33029219],
    #       [  9.44218856, -25.94568946],
    #       ...,
    #       [-10.86402299,  -1.15554   ],
    #       [-14.65615727,  -1.03794057],
    #       [-16.53150258,   4.50993521]])
    

    细胞聚类分群

    # cell clustering
    clusters = phate.cluster.kmeans(phate_operator, n_clusters=8)
    clusters
    #array([0, 2, 1, ..., 4, 4, 4], dtype=int32)
    
    # 聚类结果可视化
    scprep.plot.scatter2d(Y_phate, c=clusters, figsize=(8,7), cmap="Spectral",
                          ticks=False, label_prefix="PHATE", title= "Chronic LCMV")
    plt.savefig("plot_phate_chronic_2d_by_cluster.png")
    
    image.png

    使用MAGIC进行数据填充

    #Creating the MAGIC operator
    magic_op = magic.MAGIC()
    
    # Running MAGIC for all genes
    chronic_magic = magic_op.fit_transform(chronic, genes='all_genes')
    chronic_magic.head()
    #                    Mrpl15 (ENSMUSG00000033845)  ...  CAAA01147332.1 (ENSMUSG00000095742)
    #0                                                ...
    #AAACCTGAGAGTAAGG-1                     0.150924  ...                             0.076723
    #AAACCTGGTGTGAAAT-1                     0.153466  ...                             0.067584
    #AAACGGGAGGCTCATT-1                     0.146601  ...                             0.064838
    
    # rename columns names
    chronic_magic.columns = [i.split(" ")[0] for i in chronic_magic.columns.tolist()]
    
    # marker基因可视化
    markers = ["Sell", "Ccr7", "Tcf7", "Slamf6", "Xcl1", "Il7r", "Eomes", "Tbx21", "Gzmb", "Prf1", "Pdcd1", "Havcr2", "Cd101", "Cx3cr1", "Cxcr6"]
    
    for marker in markers:
            # 2d plot
            scprep.plot.scatter2d(Y_phate, c=chronic_magic[marker], figsize=(8,7), cmap="Reds",
                            ticks=False, label_prefix="PHATE", title=marker + " magic expression")
            plt.savefig("plot_chronic_magic_marker_2d_" + marker + ".png")
    
    image.png

    根据这些marker基因的表达情况,我们将不同的cluster进行细胞类型的注释,得到以下的细胞注释结果。

    Tnaive:Sell, Ccr7
    Tsl: Tcf7, Slamf6, Xcl1
    Ttrans: Cx3cr1, Cxcr6
    Tex: Pdcd1, Havcr2, Cd101

    image.png

    多样本合并分析

    ### analysis for combined five sample ###
    # Merge all datasets and create a vector representing the time point of each sample
    alldata = [chronic,tumor8,tumor17,dln8,dln17]
    
    EBT_counts, sample_labels = scprep.utils.combine_batches(
        alldata,
        ["Chronic","Early Tumor","Late Tumor","Early LN","Late LN"],
        append_to_cell_names=True
    )
    del alldata # removes objects from memory
    
    EBT_counts.head()
    #                            Mrpl15 (ENSMUSG00000033845)  ...  CAAA01147332.1 (ENSMUSG00000095742)
    #AAACCTGAGAGTAAGG-1_Chronic                          0.0  ...                             0.000000
    #AAACCTGGTGTGAAAT-1_Chronic                          0.0  ...                             0.000000
    #AAACGGGAGGCTCATT-1_Chronic                          0.0  ...                             0.000000
    #AAACGGGGTAGCTTGT-1_Chronic                          0.0  ...                             0.000000
    #AAACGGGGTCGAATCT-1_Chronic                          0.0  ...                             0.552073
    #[5 rows x 10246 columns]
    
    EBT_counts.shape
    #(5340, 10246)
    
    sample_labels
    #AAACCTGAGAGTAAGG-1_Chronic    Chronic
    #AAACCTGGTGTGAAAT-1_Chronic    Chronic
    #AAACGGGAGGCTCATT-1_Chronic    Chronic
    #AAACGGGGTAGCTTGT-1_Chronic    Chronic
    #AAACGGGGTCGAATCT-1_Chronic    Chronic
    #                               ...
    #Name: sample_labels, Length: 5340, dtype: object
    

    PHATE降维可视化

    #Embedding Data Using PHATE
    # Instantiating the PHATE estimator
    phate_operator = phate.PHATE(n_jobs=-2)
    
    Y_phate = phate_operator.fit_transform(EBT_counts)
    Y_phate
    #array([[ 79.05651647,  15.42592929],
    #       [ 20.72815444,  25.65566379],
    #       [ 90.57712893,  -4.77917562],
    #       ...,
    #       [-52.39011592,  39.20142516],
    #       [-28.51731009,  -8.66499775],
    #       [-46.00734805, -17.37265621]])
    
    scprep.plot.scatter2d(Y_phate, c=sample_labels, figsize=(10,8), cmap="Spectral",
                          ticks=False, label_prefix="PHATE")
    plt.savefig("plot_phate_2d_by_sample.png")
    
    test4.png
    # 3D visualization
    phate_operator.set_params(n_components=3)
    Y_phate_3d = phate_operator.transform()
    Y_phate_3d
    #array([[ 77.85894712,  13.21732854, -11.29708591],
    #       [ 17.23971363,  19.47621167, -27.54015408],
    #       [ 87.69788489,   4.0231593 ,  13.16082805],
    #       ...,
    #       [-46.46850791,  42.68065699,  -2.93836416],
    #       [-20.53357918,  -4.9821892 , -22.70265013],
    #       [-36.18351133,  -9.56494547, -29.67206442]])
    
    scprep.plot.scatter3d(Y_phate_3d, c=sample_labels, figsize=(8,6), cmap="Spectral",
                          ticks=False, label_prefix="PHATE")
    plt.savefig("plot_phate_3d_by_sample.png")
    
    test5.png

    细胞聚类分群

    # cell clustering
    clusters = phate.cluster.kmeans(phate_operator, n_clusters=12)
    clusters
    #array([8, 6, 1, ..., 2, 4, 4], dtype=int32)
    
    # save meta data
    meta = pd.merge(pd.DataFrame(sample_labels),pd.DataFrame(clusters,index=sample_labels.index,columns=["cluster"]),left_index=True,right_index=True)
    meta.head()
    #                           sample_labels  cluster
    #AAACCTGAGAGTAAGG-1_Chronic       Chronic        8
    #AAACCTGGTGTGAAAT-1_Chronic       Chronic        6
    #AAACGGGAGGCTCATT-1_Chronic       Chronic        1
    #AAACGGGGTAGCTTGT-1_Chronic       Chronic        8
    #AAACGGGGTCGAATCT-1_Chronic       Chronic        8
    
    meta.to_csv("metadata.csv")
    
    scprep.plot.scatter3d(Y_phate_3d, c=clusters, figsize=(8,6), cmap="Spectral",
                          ticks=False, label_prefix="PHATE")
    plt.savefig("plot_phate_3d_by_cluster.png")
    
    test6.png
    scprep.plot.scatter2d(Y_phate, c=clusters, figsize=(8,6), cmap="Spectral",
                          ticks=False, label_prefix="PHATE")
    plt.savefig("plot_phate_2d_by_cluster.png")
    
    test7.png
    # to save as a gif:
    scprep.plot.rotate_scatter3d(Y_phate_3d, c=sample_labels,
                                 figsize=(8,6), cmap="Spectral",
                                 ticks=False, label_prefix="PHATE", filename="phate.gif")
    
    
    phate.gif

    使用MAGIC进行数据填充

    # rename columns names
    EBT_counts.columns = [i.split(" ")[0] for i in EBT_counts.columns.tolist()]
    
    # MAGIC imputation
    #Creating the MAGIC operator
    magic_op = magic.MAGIC()
    
    #Running MAGIC with gene selection
    #bmmsc_magic = magic_op.fit_transform(bmmsc_data, genes=["Mpo", "Klf1", "Ifitm1"])
    #bmmsc_magic.head()
    
    #Visualizing gene-gene relationships
    #fig, (ax1, ax2) = plt.subplots(1,2, figsize=(16, 6))
    
    #scprep.plot.scatter(x=bmmsc_data['Mpo'], y=bmmsc_data['Klf1'], c=bmmsc_data['Ifitm1'],  ax=ax1,
    #                    xlabel='Mpo', ylabel='Klf1', legend_title="Ifitm1", title='Before MAGIC')
    
    #scprep.plot.scatter(x=bmmsc_magic['Mpo'], y=bmmsc_magic['Klf1'], c=bmmsc_magic['Ifitm1'], ax=ax2,
    #                    xlabel='Mpo', ylabel='Klf1', legend_title="Ifitm1", title='After MAGIC')
    #plt.tight_layout()
    #plt.show()
    
    # Running MAGIC for all genes
    EBT_counts_magic = magic_op.fit_transform(EBT_counts, genes='all_genes')
    EBT_counts_magic.head()
    
    markers = ["Sell", "Ccr7", "Tcf7", "Slamf6", "Xcl1", "Il7r", "Eomes", "Tbx21", "Gzmb", "Prf1", "Pdcd1", "Havcr2", "Cd101", "Cx3cr1", "Cxcr6"]
    
    for marker in markers:
            # 2d plot
            scprep.plot.scatter2d(Y_phate, c=EBT_counts_magic[marker], figsize=(8,6), cmap="Reds",
                            ticks=False, label_prefix="PHATE", title=marker + " magic expression")
            plt.savefig("plot_magic_marker_2d_" + marker + ".png")
            # 3d plot
            scprep.plot.scatter3d(Y_phate_3d, c=EBT_counts_magic[marker], figsize=(8,6), cmap="Reds",
                            ticks=False, label_prefix="PHATE", title=marker + " magic expression")
            plt.savefig("plot_magic_marker_3d_" + marker + ".png")
    
    image.png

    基因差异表达分析

    # Differential analysis
    # By samples
    de_samples = scprep.stats.differential_expression_by_cluster(EBT_counts_magic,clusters=sample_labels,direction="up")
    de_samples
    de_samples["Chronic"]
    
    for key,value in de_samples.items():
            print(value.head(n=10))
    
    # save DE data
    de_samples["Chronic"].to_csv("DEs_Chronic.csv")
    de_samples["Early Tumor"].to_csv("DEs_Early_Tumor.csv")
    de_samples["Early LN"].to_csv("DEs_Early_LN.csv")
    
    # By clusters
    de_clusters = scprep.stats.differential_expression_by_cluster(EBT_counts_magic,clusters=clusters,direction="up")
    de_clusters
    de_clusters[0]
    
    for key,value in de_clusters.items():
            print(value.head(n=10))
    

    相关文章

      网友评论

        本文标题:使用PHATE复现Science Immunology文章的结果

        本文链接:https://www.haomeiwen.com/subject/msspaltx.html