Author: Matthew C. Altman 1,2,21✉ , Darawan Rinchai3,21✉, Nicole Baldwin 4, Mohammed Toufiq 3, Elizabeth Whalen1, Mathieu Garand3, Basirudeen Syed Ahamed Kabeer3, Mohamed Alfaki3, Scott R. Presnell 1,Prasong Khaenam1, Aaron Ayllón-Benítez 5, Fleur Mougin5, Patricia Thébault6, Laurent Chiche7, Noemie Jourde-Chiche8, J. Theodore Phillips4, Goran Klintmalm4, Anne O’Garra 9,10, Matthew Berry11, Chloe Bloom10, Robert J. Wilkinson12,13,14, Christine M. Graham9, Marc Lipman15, nGanjana Lertmemongkolchai 16, Davide Bedognetti3, Rodolphe Thiebaut 5, Farrah Kheradmand 17, Asuncion Mejias 18, Octavio Ramilo 18, Karolina Palucka4,19, Virginia Pascual 4,20, Jacques Banchereau 4,19 & Damien Chaussabel 1,3✉
1Systems Immunology, Benaroya Research Institute, Seattle, WA, USA.
2Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA, USA.
3Research Branch, Sidra Medicine, Doha, Qatar.
4Baylor Institute for Immunology Research, Baylor Research Institute, Dallas,TX, USA.
5Inserm U1219 Bordeaux Population Health Research Center, Bordeaux University, Bordeaux, France.
6LaBRI, CNRS UMR5800,Bordeaux University, Bordeaux, France.
7Department of Internal Medicine, Hopital Européen, Marseille, France.
8Aix-Marseille University, C2VN,INSERM 1263, INRA 1260 Marseille, France.
9Laboratory of Immunoregulation and Infection, The Francis Crick Institute, London, UK.
10National Heart and Lung Institute, Imperial College London, London, UK.
11Royal Cornwall Hospitals NHS Trust, Truro, UK.
12The Francis Crick Institute, London, UK.
13Department of Infectious Disease, Imperial College, London, UK.
14Wellcome Center for Infectious Diseases Research in Africa and Department of Medicine, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town Observatory, 7925 Cape Town, Republic of South Africa.
15UCL Respiratory, Division of Medicine, University College London, London, UK.
16Centre for Research and Development of Medical Diagnostic Laboratories, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.
17Baylor College of Medicine & Center for Translational Research on Inflammatory Diseases, Michael E. DeBakey VAMC, Houston, TX, USA.
18Abigail Wexner Research Institute at Nationwide Children’s Hospital and the Ohio State University School of Medicine, Columbus, OH, USA.
19The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
20Weill Cornell Medicine, New York, NY, USA.
21These authors contributed equally: Matthew C. Altman, Darawan Rinchai.
本研究设计了一种新的转录组模块库——BloodGen3,可作为分析和解释血液转录组稳定的可重复的框架。此分析框架构建是基于985个不同的免疫学和生理学状态的血液转录组表达谱的共聚类模式。支持多种可选自定义来源的解释,包括:module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles.
Generation of a collection of datasets covering a wide range of immune states.
- TB (23 condition, 11 control, 34 total);
- Staph aureus (99 condition, 44 control, 143 total);
- Sepsis (35 condition, 12 control, 47 total);
- HIV (28 condition, 35 control, 63 total);
- Flu (25 condition, 14 control, 39 total);
- RSV (70 condition, 14 control, 84 total);
- B-cell deficiency (20 condition, 13 control, 33 total);
- Liver Transplant (94 condition, 30 control, 124 total);
- Pregnancy (25 condition, 20 control, 45 total);
- Melanoma Stage IV (22 condition, 5 control, 27 total);
- Kawasaki (21 condition, 23 control, 44 total);
- Juvenile Dermatomyositis (40 condition, 9 control, 49 total);
- COPD (19 condition, 24 control, 43 total);
- MS - untreated (34 condition, 22 control, 56 total);
- Pediatric SLE (55 condition, 14 control, 69 total);
- SoJIA (62 condition, 23 control, 85 total)
Implementation of a stepwise approach to blood transcriptional module repertoire construction.

a. 收集跨越多种免疫和生理状态的16个血液转录组数据集作为识别基因共表达模式起点
b. 利用k-means对每个数据集进行独立的聚类分析
c. 记录两个基因包含在同一聚类中的实例数量,触发事件范围0-16之间(即反映了没有或所有16个数据集中的共聚类范围)
d. 共表达记录作为输入数据并建立一个co-clustering graph,节点代表genes,边代表共表达事件(至少发生一次),并根据聚类次数赋予相应的权重。
e. 根据权重逐级选择整个网络中选择相应的sub-network,并分配相应module ID.
- 基于欧式距离(Euclidean distance)
- Hartigan’s K-Means clustering algorithm (分层 + k-means聚类算法)
- 优化设置:If at any k the algorithm creates a cluster whose members’ average Pearson correlation to the mean cluster vector is <0.3, the cluster is deleted and the algorithm begins again at k-1. The ‘ideal’ number of clusters (k) for each dataset was determined within a range of k=1-100 by means of the jump statistic.
Integer nLastQuartile = 4;
Integer nMaxRelaxtion = m_nNumDatasets / 3;
Integer nRelaxtionIncrement = Math.max(1, (nMaxRelaxtion / 3));
Integer nRelaxtion = nMaxRelaxtion;
for (int nCliqueThreshold = numberOfDatasets; nCliqueThreshold >= 1; nCliqueThreshold--)
Integer nQuartile = ((nThreshold * 100) / m_nNumDatasets) / 25;
if (nQuartile.equals(nLastQuartile) == false)
if (nQuartile <= 2)
nRelaxtion = Math.max(0, nRelaxtion - nRelaxtionIncrement);
nLastQuartile = nQuartile;
Integer nParacliqueThreshold = nThreshold - nRelaxtion;
maximumClique = find maximum clique w co-clustering weight >= nCliqueThreshold
if (size of maximumClique > 15)
paraclique = find paraclique in graph
remove maximumClique and paraclique from graph
} while (maximumClique is found)

Development of module-level analysis workflows and visualizations.
- identify functional convergences among the genes that comprise each set
- summarize changes in overall transcript abundance related to pathological processes or therapeutic interventions.
模块功能注释以及富集使用的工具包括:GSAn, Literature Lab, IPA, DAVID, KEGG, BioCarta, OMIM, and GOTERM。
作者也将 BloodGen3 repertoire与先前的研究(Gen1, Gen2)的重叠情况进行了鉴定1。
modual-水平的分析则确定了组间丰度水平不同的构成型转录本的比例s (e.g. cases vs.controls; pre-treatment vs. post-treatment)。由此衍生出两个与转录本比例相关的values(升高或降低)。(cut-off)依据用户自己的偏好(based on statistics, fold changes and/or differences with or without multiple-testing correction for group comparisons.)接下来使用"fingerprint"对模块水平的差异表达进行可视化。
The development of the BloodGen3 module fingerprint grids
a. 热图上的每一行分别对应于给定数据集和给定方向的转录本丰度的变化(即转录本丰度的增加或减少)。将健康对照作为基线,转录丰度增加为红色,减少为蓝色,因此热图上总计有32行。列对应包含了所有BloodGen3 库(N = 382)。底部显示的颜色与模块aggregates ID相关联,仅用于说明在指纹网格图上组织模块的策略。整个过程达成的效果是:指纹网格中每一行的表达级别的变化形成相关性,而不是这种指纹网格的初始迭代的情况。
b. 模块将根据以下情况进行排列:根据16个数据集的相似性将382模块被划分为38 clusters (aggregates)。 27 个aggregates的子集包——含两个以上的modules,作为图中的行。图b中的每条带箭头的线的长度代表了每个cluster中的module的数量。
c. 当使用BloodGen3Module R package作为血液转录组数据的下游分析时,module水平的将映射到这个图表,并通过不同的颜色来呈现其密度的改变。
Illustrative case of fingerprint grid plot representation

In-depth functional annotation of fixed transcriptional module repertoires
Functional annotation:
- 方法
- concurrent ontology, pathway or literature-term profiling analyses
- determination for the constitutive genes for each module of expression patterns in select reference
- 步骤
Step 1——Functional profiling
- 使用
对382个module进行GO分析。 - 使用
以及the Ingenuity Pathway Analysis(IPA)
进行通路分析。 - 使用
Literature Lab
进行literature-term enrichment。 - 使用
- 使用
Step 2—Expression patterns in reference transcriptome datasets:(此步使用了三个不同的转录组数据作为reference来改善
module库的特征和功能的可解释性。)- Novershtern2
- Speake3
- Monaco4
Measuring inter-individual variability for the molecular stratification of patient cohorts
此步主要是表征个体差异,依据个体的转录子的counts设置固定的cutoff(e.g: absolute fold change in expression and absolute difference in expression vs. average of control samples)。计算不同个体表达的差异基因的百分比,这些百分比相当于从组间比较获得的值,只是它们是为每个单独的样本得到的。

Profiling the abundance of A28 interferon-inducible genes at the aggregate level across reference patient cohorts

a. 展示了16个健康状态下,转录子丰度在27个module aggregates(包含至少两个以上的module)上相应模式的热图。可以看到,在第一分层中,急性HIV感染与MS被聚类,另外14个健康状态发生聚类。发生这种二分情况的原因是:与炎症或/和髓系细胞相关的模块发生了抑制(A34–A38) ,伴随这淋巴细胞反应增强相联系的模块发生聚集 (A1–A8) 。这暗示,首要的signature发生变化是由于粒系和淋巴细胞的数量在整体比例上发生了变化。这里值得注意的是,尽管如上述情况确实可能存在并影响了相应的整体转录丰度,但在IFN信号模块在HIV与其他的聚类组别中却出现的相似的富集情况。(e.g 急性HIV感染属于一种簇,而SLE或流感病毒属于另一个簇)。
b. 显示了A28 aggregate中的6个modules的基因组成
c. 显示了A28 aggregate在不同感染性疾病状况下患者与对照组之间的基因表达差异
d. 显示了使用IFNα治疗的丙肝感染患者与使用IFNβ治疗的MS患者的A28 aggregates相关基因表达差异。
Profiling the abundance of A28 interferon-inducible genes at the module level across reference patient cohorts

Profiling the abundance of A28 interferon-inducible genes at the module level across individual subjects
Development and availability of ancillary resources


表达矩阵的第一列和第二列分别添加gene symbol和gene对应的module信息,随后的列则是样本所对应表达信息。
- determination of differential expression
determination of differential expression
第一列为gene symbol
。差异表达可以自定义设定,若是比较两组差异,可以比较p值和fold change(FC),若是在个体水平差异以及FC。 - calculation of the percentage of the response
calculation of the percentage of the response
,第二列为Total gene
则是module中的所含基因数量。第三、四列两组比较后up-regulated modules
和down-regulated modules
,并将该列进行细分。最后一列为% Responses
,即总的一个模块内的基因的响应率。个module的响应比率的计算方式是:(up-regulated gene number - down-regulated gene number)/ Total gene
Group comparison analysis
# t-test:
Group_df <- Groupcomparison(data.matrix,
sample_info = sample_ann,
FC = 1.5,
pval = 0.1,
Group_column = "Group_test",
Test_group = "Sepsis",
Ref_group = "Control")
# limma
Group_limma <- Groupcomparisonlimma(data.matrix,
sample_info = sample_ann,
FC = 1.5,
pval = 0.1,
Group_column = "Group_test",
Test_group = "Sepsis",
Ref_group = "Control")
:为基因水平的表达矩阵,使用gene symbol
)。注意:这里的归一化预处理不能进行log2转换。 -
Fingerprint grid visualization
此步进行module水平的转录子丰度改变的可视化。构成module的转录子,在两组间的差异表达情况的百分比作为module response
cutoff = 15,
Ref_group = "Control",
filename = "Group_comparison_")
Individual sample analysis
Individual_df <- Individualcomparison(data.matrix,
sample_info = sample_ann,
FC = 1.5,
DIFF = 10,
Group_column = "Group_test",
Ref_group = "Control")
Individual fingerprint visualization
sample_info = sample_ann,
cutoff = 15,
rowSplit = TRUE,
Group_column = "Group_test",
show_ref_group = FALSE,
Ref_group = "Control",
Aggregate = NULL,
filename = "Gen3_Individual_plot",
height = NULL,
width = NULL)
