美文网首页
Development of a fixed module re

Development of a fixed module re

作者: GIANT_fish | 来源:发表于2022-12-08 19:20 被阅读0次

    Author: Matthew C. Altman 1,2,21✉ , Darawan Rinchai3,21✉, Nicole Baldwin 4, Mohammed Toufiq 3, Elizabeth Whalen1, Mathieu Garand3, Basirudeen Syed Ahamed Kabeer3, Mohamed Alfaki3, Scott R. Presnell 1,Prasong Khaenam1, Aaron Ayllón-Benítez 5, Fleur Mougin5, Patricia Thébault6, Laurent Chiche7, Noemie Jourde-Chiche8, J. Theodore Phillips4, Goran Klintmalm4, Anne O’Garra 9,10, Matthew Berry11, Chloe Bloom10, Robert J. Wilkinson12,13,14, Christine M. Graham9, Marc Lipman15, nGanjana Lertmemongkolchai 16, Davide Bedognetti3, Rodolphe Thiebaut 5, Farrah Kheradmand 17, Asuncion Mejias 18, Octavio Ramilo 18, Karolina Palucka4,19, Virginia Pascual 4,20, Jacques Banchereau 4,19 & Damien Chaussabel 1,3✉

    Affiliations:
    1Systems Immunology, Benaroya Research Institute, Seattle, WA, USA.
    2Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA, USA.
    3Research Branch, Sidra Medicine, Doha, Qatar.
    4Baylor Institute for Immunology Research, Baylor Research Institute, Dallas,TX, USA.
    5Inserm U1219 Bordeaux Population Health Research Center, Bordeaux University, Bordeaux, France.
    6LaBRI, CNRS UMR5800,Bordeaux University, Bordeaux, France.
    7Department of Internal Medicine, Hopital Européen, Marseille, France.
    8Aix-Marseille University, C2VN,INSERM 1263, INRA 1260 Marseille, France.
    9Laboratory of Immunoregulation and Infection, The Francis Crick Institute, London, UK.
    10National Heart and Lung Institute, Imperial College London, London, UK.
    11Royal Cornwall Hospitals NHS Trust, Truro, UK.
    12The Francis Crick Institute, London, UK.
    13Department of Infectious Disease, Imperial College, London, UK.
    14Wellcome Center for Infectious Diseases Research in Africa and Department of Medicine, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town Observatory, 7925 Cape Town, Republic of South Africa.
    15UCL Respiratory, Division of Medicine, University College London, London, UK.
    16Centre for Research and Development of Medical Diagnostic Laboratories, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.
    17Baylor College of Medicine & Center for Translational Research on Inflammatory Diseases, Michael E. DeBakey VAMC, Houston, TX, USA.
    18Abigail Wexner Research Institute at Nationwide Children’s Hospital and the Ohio State University School of Medicine, Columbus, OH, USA.
    19The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
    20Weill Cornell Medicine, New York, NY, USA.
    21These authors contributed equally: Matthew C. Altman, Darawan Rinchai.
    ✉email: maltman@benaroyaresearch.org; drinchai@sidra.org; dchaussabel@sidra.org
       本研究设计了一种新的转录组模块库——BloodGen3,可作为分析和解释血液转录组稳定的可重复的框架。此分析框架构建是基于985个不同的免疫学和生理学状态的血液转录组表达谱的共聚类模式。支持多种可选自定义来源的解释,包括:module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles.

    Results

    Generation of a collection of datasets covering a wide range of immune states.

       为构建框架,为了识别到尽量宽泛的免疫学反应,纳入了16个datasets的共985个样本。

    • TB (23 condition, 11 control, 34 total);
    • Staph aureus (99 condition, 44 control, 143 total);
    • Sepsis (35 condition, 12 control, 47 total);
    • HIV (28 condition, 35 control, 63 total);
    • Flu (25 condition, 14 control, 39 total);
    • RSV (70 condition, 14 control, 84 total);
    • B-cell deficiency (20 condition, 13 control, 33 total);
    • Liver Transplant (94 condition, 30 control, 124 total);
    • Pregnancy (25 condition, 20 control, 45 total);
    • Melanoma Stage IV (22 condition, 5 control, 27 total);
    • Kawasaki (21 condition, 23 control, 44 total);
    • Juvenile Dermatomyositis (40 condition, 9 control, 49 total);
    • COPD (19 condition, 24 control, 43 total);
    • MS - untreated (34 condition, 22 control, 56 total);
    • Pediatric SLE (55 condition, 14 control, 69 total);
    • SoJIA (62 condition, 23 control, 85 total)

    Implementation of a stepwise approach to blood transcriptional module repertoire construction.

    The module repertoire construction process

    a. 收集跨越多种免疫和生理状态的16个血液转录组数据集作为识别基因共表达模式起点
    b. 利用k-means对每个数据集进行独立的聚类分析
    c. 记录两个基因包含在同一聚类中的实例数量,触发事件范围0-16之间(即反映了没有或所有16个数据集中的共聚类范围)
    d. 共表达记录作为输入数据并建立一个co-clustering graph,节点代表genes,边代表共表达事件(至少发生一次),并根据聚类次数赋予相应的权重。
    e. 根据权重逐级选择整个网络中选择相应的sub-network,并分配相应module ID.

    算法与伪代码

    • 基于欧式距离(Euclidean distance)
    • Hartigan’s K-Means clustering algorithm (分层 + k-means聚类算法)
    • 优化设置:If at any k the algorithm creates a cluster whose members’ average Pearson correlation to the mean cluster vector is <0.3, the cluster is deleted and the algorithm begins again at k-1. The ‘ideal’ number of clusters (k) for each dataset was determined within a range of k=1-100 by means of the jump statistic.
      伪代码如下:
    Integer nLastQuartile = 4; 
    Integer nMaxRelaxtion = m_nNumDatasets / 3; 
    Integer nRelaxtionIncrement = Math.max(1, (nMaxRelaxtion / 3)); 
    Integer nRelaxtion = nMaxRelaxtion; 
    for (int nCliqueThreshold = numberOfDatasets; nCliqueThreshold >= 1; nCliqueThreshold--) 
    { 
    Integer nQuartile = ((nThreshold * 100) / m_nNumDatasets) / 25; 
    if (nQuartile.equals(nLastQuartile) == false) 
    { 
    if (nQuartile <= 2) 
    { 
    nRelaxtion = Math.max(0, nRelaxtion - nRelaxtionIncrement); 
    } 
    nLastQuartile = nQuartile; 
    } 
    Integer nParacliqueThreshold = nThreshold - nRelaxtion; 
    do 
    { 
    maximumClique = find maximum clique w co-clustering weight >= nCliqueThreshold 
    if (size of maximumClique > 15) 
    { 
    paraclique = find paraclique in graph 
    remove maximumClique and paraclique from graph 
    } 
    } while (maximumClique is found)
    
    Construction of weighted co-clustering networks
       权重共聚类网络用于构建BloodGen3的模块库。具体来讲,即获取根据在不同生理条件的"states"状态下获得相应共表达网络的factor。对于全血样本而言,这些状态即是不同的疾病或生理学表型。在A场景下(某种生理条件的"states"),基因集在所有撒种疾病状态下均共表达,故网络权重为3(边的值设置为3).在场景B和C下,共表达发生在2种或3中疾病条件下,权重则为2和3。

    Development of module-level analysis workflows and visualizations.

       通过上述方法构建了依赖转录子丰度对应的各种生理特征下的基因集作为候选给定模块,据此,利用这些模块作为"framework"能够:

    1. identify functional convergences among the genes that comprise each set
    2. summarize changes in overall transcript abundance related to pathological processes or therapeutic interventions.
         最终BloodGen3模块库包含了382个模块,每个模块平均的基因数为37.1,中位数为26.5,范围在12-169。
        模块功能注释以及富集使用的工具包括:GSAn, Literature Lab, IPA, DAVID, KEGG, BioCarta, OMIM, and GOTERM。
         作者也将 BloodGen3 repertoire与先前的研究(Gen1, Gen2)的重叠情况进行了鉴定1
         modual-水平的分析则确定了组间丰度水平不同的构成型转录本的比例s (e.g. cases vs.controls; pre-treatment vs. post-treatment)。由此衍生出两个与转录本比例相关的values(升高或降低)。(cut-off)依据用户自己的偏好(based on statistics, fold changes and/or differences with or without multiple-testing correction for group comparisons.)接下来使用"fingerprint"对模块水平的差异表达进行可视化。
      The development of the BloodGen3 module fingerprint grids
         基于在16个数据集上观察到的转录本丰度水平的相似性,执行第二层聚类,将382个模块分组为38个“aggregates”。分离到使用这种方法推导出了两个级别粒度(即模块级别和模块aggregate级别)。模块被限制为一个最小粒度aggregate级别,用于限制变量的数量便于管理。
      a. 热图上的每一行分别对应于给定数据集和给定方向的转录本丰度的变化(即转录本丰度的增加或减少)。将健康对照作为基线,转录丰度增加为红色,减少为蓝色,因此热图上总计有32行。列对应包含了所有BloodGen3 库(N = 382)。底部显示的颜色与模块aggregates ID相关联,仅用于说明在指纹网格图上组织模块的策略。整个过程达成的效果是:指纹网格中每一行的表达级别的变化形成相关性,而不是这种指纹网格的初始迭代的情况。
         执行此步聚合后,在给定的模块行中可以观察到某种程度的功能收敛。例如,在指纹网格中,能够发现A1行包含了几个与淋巴细胞相关的模块,而A28行包含了6个不同的“干扰素模块”,A33行和A35行包含了许多与炎症相关的功能模块。
      b. 模块将根据以下情况进行排列:根据16个数据集的相似性将382模块被划分为38 clusters (aggregates)。 27 个aggregates的子集包——含两个以上的modules,作为图中的行。图b中的每条带箭头的线的长度代表了每个cluster中的module的数量。
      c. 当使用BloodGen3Module R package作为血液转录组数据的下游分析时,module水平的将映射到这个图表,并通过不同的颜色来呈现其密度的改变。

    Illustrative case of fingerprint grid plot representation

       通过减小数据维度更便于理解数据本身。fingerprint图垂直方向直观的显示了module中的aggregate的变化,水平方向则显示aggregate的内部变化以及其中包含的所有modules的变化。所有分析流与解释可以在BooldGen3网页上找到。

    Fingerprint grid plots

    In-depth functional annotation of fixed transcriptional module repertoires

    Functional annotation:

    • 方法
      1. concurrent ontology, pathway or literature-term profiling analyses
      2. determination for the constitutive genes for each module of expression patterns in select reference
    • 步骤
      • Step 1——Functional profiling

        • 使用DAVID, GOTERM以及GSAn对382个module进行GO分析。
        • 使用KEGG, BioCarta以及the Ingenuity Pathway Analysis(IPA)进行通路分析。
        • 使用Literature Lab进行literature-term enrichment。
        • 使用RcisTarget R包鉴定识别在每个modul中过表达的转录因子结合基序
          最终将这一步获取的注释进行整理合并,获取不同module的功能注释titles.
      • Step 2—Expression patterns in reference transcriptome datasets:(此步使用了三个不同的转录组数据作为reference来改善BloodGen3module库的特征和功能的可解释性。)

        • Novershtern2
        • Speake3
        • Monaco4

    Measuring inter-individual variability for the molecular stratification of patient cohorts

       此步主要是表征个体差异,依据个体的转录子的counts设置固定的cutoff(e.g: absolute fold change in expression and absolute difference in expression vs. average of control samples)。计算不同个体表达的差异基因的百分比,这些百分比相当于从组间比较获得的值,只是它们是为每个单独的样本得到的。


    Individual-level module heatmap

    Profiling the abundance of A28 interferon-inducible genes at the aggregate level across reference patient cohorts

       作者对BloodGen3的应用做了相应的解释,如下图:

    Module aggregate abundance patterns across the 16 disease or physiological states
    a. 展示了16个健康状态下,转录子丰度在27个module aggregates(包含至少两个以上的module)上相应模式的热图。可以看到,在第一分层中,急性HIV感染与MS被聚类,另外14个健康状态发生聚类。发生这种二分情况的原因是:与炎症或/和髓系细胞相关的模块发生了抑制(A34–A38) ,伴随这淋巴细胞反应增强相联系的模块发生聚集 (A1–A8) 。这暗示,首要的signature发生变化是由于粒系和淋巴细胞的数量在整体比例上发生了变化。这里值得注意的是,尽管如上述情况确实可能存在并影响了相应的整体转录丰度,但在IFN信号模块在HIV与其他的聚类组别中却出现的相似的富集情况。(e.g 急性HIV感染属于一种簇,而SLE或流感病毒属于另一个簇)。
    b. 显示了A28 aggregate中的6个modules的基因组成
    c. 显示了A28 aggregate在不同感染性疾病状况下患者与对照组之间的基因表达差异
    d. 显示了使用IFNα治疗的丙肝感染患者与使用IFNβ治疗的MS患者的A28 aggregates相关基因表达差异。
      对I型IFN的反应主要是构成M8.3和M10.1的转录本丰度的不成比例的增加。相反地,M15.86对I型IFN处理后的变化非常的小,却在急性HIV感染和流感病毒感染时显著的增加。所以,M15.86可能与IFNγ相关。RSV与其他感染相比,其IFN的反应要更弱,而TB感染出现的强烈的IFN反应。

    Profiling the abundance of A28 interferon-inducible genes at the module level across reference patient cohorts

    Literature profiles and patterns of changes in abundance across reference datasets for the modules comprising aggregate A28

    Profiling the abundance of A28 interferon-inducible genes at the module level across individual subjects

    Development and availability of ancillary resources



    BloodGen3Module应用

      BloodGen3Module R包的工作流程图,如下所示:

    工作流程图

    应用BloodGen3Module 包括了三个步骤:

    Annotation of the expression matrix

       表达矩阵的第一列和第二列分别添加gene symbol和gene对应的module信息,随后的列则是样本所对应表达信息。

    1. determination of differential expression
      determination of differential expression
         第一列为gene symbol,第二列为相关module。差异表达可以自定义设定,若是比较两组差异,可以比较p值和fold change(FC),若是在个体水平差异以及FC。
    2. calculation of the percentage of the response
      calculation of the percentage of the response
         在组水平比较的实例:第一列为module,第二列为Total gene则是module中的所含基因数量。第三、四列两组比较后up-regulated modulesdown-regulated modules,并将该列进行细分。最后一列为% Responses,即总的一个模块内的基因的响应率。个module的响应比率的计算方式是:(up-regulated gene number - down-regulated gene number)/ Total gene

    代码

    Group comparison analysis

       组间差异可以使用t-test(R包方法Groupcomparison)和limma(R包方法Groupcomparisonlimma).
    相关代码:
    t-test:

    # t-test:
    Group_df <- Groupcomparison(data.matrix,
                sample_info = sample_ann,
                FC = 1.5,
                pval = 0.1,
                FDR = TRUE,
                Group_column = "Group_test",
                Test_group = "Sepsis",
                Ref_group = "Control")
    



    limma:

    # limma
    Group_limma <- Groupcomparisonlimma(data.matrix,
                                        sample_info =  sample_ann,
                                        FC = 1.5,
                                        pval = 0.1,
                                        FDR = TRUE,
                                        Group_column = "Group_test",
                                        Test_group = "Sepsis",
                                        Ref_group = "Control")
    

    代码变量说明:

    • data.matrix:为基因水平的表达矩阵,使用gene symbol作为矩阵的row.names,在进行方法GroupcomparisonGroupcomparisonlimma前必须进行预处理(如进行normalization)。注意:这里的归一化预处理不能进行log2转换。
    • sample_ann:为样本的注释文件,将与data.matrix里列名相对应的样本名设置为row.name。将其特定的分组信息(相应的condition信息)作为列,并对其命名(比如,可以将列名设置为Group_test)。

    Fingerprint grid visualization

       此步进行module水平的转录子丰度改变的可视化。构成module的转录子,在两组间的差异表达情况的百分比作为module response在图表中进行展现。在图表中的圆点具有代表相应的aggregate中的module(它们的位置是固定的),而红色或蓝色代表相应相应的基因转录百分比在表达水平的增高或降低。

    gridplot(Group_df,
             cutoff = 15,
             Ref_group = "Control",
             filename = "Group_comparison_")
    

    Individual sample analysis

    Individual_df <- Individualcomparison(data.matrix,
                                          sample_info = sample_ann,
                                          FC = 1.5,
                                          DIFF = 10,
                                          Group_column = "Group_test",
                                          Ref_group =  "Control")
    

    Individual fingerprint visualization

    fingerprintplot(Individual_df,
                    sample_info = sample_ann,
                    cutoff = 15,
                    rowSplit = TRUE,
                    Group_column = "Group_test",
                    show_ref_group = FALSE,
                    Ref_group = "Control",
                    Aggregate = NULL,
                    filename = "Gen3_Individual_plot",
                    height = NULL,
                    width = NULL)
    

    参考文献

    1. Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014).
    2. Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
    3. Linsley, P. S., Speake, C., Whalen, E. & Chaussabel, D. Copy number loss of the interferon gene cluster in melanomas is linked to reduced T cell infiltrate and poor patient prognosis. PloS ONE 9, e109760 (2014).
    4. Monaco, G. et al. RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 26, 1627–1640. e7 (2019).

    相关文章

      网友评论

          本文标题:Development of a fixed module re

          本文链接:https://www.haomeiwen.com/subject/hsopfdtx.html