-by 杀杀
生信分析中,我们在全基因组测序后会获得分子突变谱信息,如何直观地将各种疾病分子亚型下的基因突变特征,以及分子共突变/互斥突变的特征展示出来呢?
我们可以使用R语言中一个非常方便的包来实现大样本数据中基因突变/表达总体特征的可视化,也就是大概这个样子-> 胶质瘤基因突变和临床信息瀑布图展示好了让我们来学习它
首先安装和加载包
library(ComplexHeatmap)
library(circlize)
library(RColorBrewer) #这是一个颜色搭配包
我们需要导入两部分数据,一部分是分子突变信息数据,另一部分是临床特征数据
这个表格的制作比较基础,就不多介绍,从TCGA数据库(胶质瘤的CGGA数据库直接是整理好的表格)中下下来的原始数据中都有突变类型注释,直接转换成表格即可。
制作需要展现的基因列表,有些基因基本上就没突变,或不关心,就不需要展示,也可以直接从excel表格导入
genelist <- c("IDH1","IDH2","TP53","ATRX","PTEN","EGFR","NF1","PIK3CA")
interestgene_matrix <- TCGA_mut_matrix_2016[match(genelist,rownames(TCGA_mut_matrix_2016)),]
将我们感兴趣的基因的突变谱取出准备画图
接下来要为热图准备一些注释信息,首先是每个突变的颜色,所有的突变类型都可以设置,我比较习惯取自己喜欢的颜色,也可以用颜色包
col <- c( "missense_variant" = "#0072b4","splice_acceptor_variant" = "#e29e3e",
"stop_gained" = "#008f8c", "frameshift_variant" = "#e29e3e",
"inframe_deletion" = "#e8c31c","synonymous_variant" = "#9ec66d")
接下来这一步很重要,给背景以及每个突变类型设置形状和颜色,如果不运行这一步,接下来的主函数将不会自动给你分配颜色和大小,会报错
另外,这边的h*0.8会在图中每行的基因突变条间留一些空隙,大家可以乘不同的数值试试看
alter_fun <- list(
background = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = "#e8e7e3", col = NA))
},
missense_variant = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["missense_variant"], col = NA))
},
splice_acceptor_variant = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["splice_acceptor_variant"], col = NA))
},
stop_gained = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["stop_gained"], col = NA))
},
frameshift_variant = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["frameshift_variant"], col = NA))
},
inframe_deletion = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["inframe_deletion"], col = NA))
},
synonymous_variant = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.8,
gp = gpar(fill = col["synonymous_variant"], col = NA))
}
)
接下来设置图例
heatmap_legend <- list(title = "Alternations",
at = c("missense_variant", "splice_acceptor_variant",
"stop_gained", "frameshift_variant", "inframe_deletion", "synonymous_variant",
"start_lost", "splice_region_variant", "splice_donor_variant",
"intron_variant", "stop_lost", "coding_sequence_variant","Mutant","Methylated","Codel"),
labels = c("missense_variant", "splice_acceptor_variant",
"stop_gained", "frameshift_variant", "inframe_deletion", "synonymous_variant",
"start_lost", "splice_region_variant", "splice_donor_variant",
"intron_variant", "stop_lost", "coding_sequence_variant","Mutant","Methylated","Codel"))
接下来就可以画突变图啦:使用的就是下面这个oncoPrint函数
oncoPrint(interestgene_matrix , #数据
alter_fun = alter_fun, col = col,
column_title = column_title,
#这两行是设置要不要去掉空行和空列 remove_empty_columns = FALSE,
#去掉空行 remove_empty_rows = FALSE,
row_names_side = "left", #我们可以把基因信息放左边
pct_side = "right",
#这个设不设置好像影响不大 alter_fun_is_vectorized = FALSE,
heatmap_legend_param = heatmap_legend, #设置图例
)
接下来我们可以导入对应的临床信息,比如肿瘤等级,分型,生存,性别信息等等。
接下来简单设置一下临床信息函数:$后面加上想要展示的列名
clini <- HeatmapAnnotation(Age=TCGA_clin_2016$AGE,
Gender=TCGA_clin_2016$SEX,
GRADE = TCGA_clin_2016$GRADE ,
HISTOLOGICAL_DIAGNOSIS = TCGA_clin_2016$HISTOLOGICAL_DIAGNOSIS ,
OS_MONTHS = TCGA_clin_2016$OS_MONTHS,
show_annotation_name = TRUE,
annotation_name_gp = gpar(fontsize = 7))
然后在刚刚的主函数中加上你的临床信息即可
oncoplot_tcga2016 <- oncoPrint(interestgene_matrix ,
alter_fun = alter_fun, col = col,
column_title = column_title,
row_names_side = "left",
pct_side = "right",
alter_fun_is_vectorized = FALSE,
heatmap_legend_param = heatmap_legend,
bottom_annotation = clini
)
oncoplot_tcga2016
需要注意的是,如果你不为临床信息设置颜色的话,每一次运行出来的颜色可能是不一样的,所以你可以设置一下颜色,以防随机的颜色影响美观:
col_os = colorRamp2(c(0, 4000), c("white", "blue"))
clini <- HeatmapAnnotation(TCGA_clin_2016$AGE,
Gender=TCGA_clin_2016$SEX,
GRADE = TCGA_clin_2016$GRADE ,
HISTOLOGICAL_DIAGNOSIS = TCGA_clin_2016$HISTOLOGICAL_DIAGNOSIS ,
OS_MONTHS = TCGA_clin_2016$OS_MONTHS,
#设置颜色
col = list(GRADE = c("II" = "orange","III" = "green","IV" = "skyblue" ),os = col_os),
show_annotation_name = TRUE,
annotation_name_gp = gpar(fontsize = 7))
另外,有时我们需要根据需求优先对一些信息进行排序,这可以直接通过对数据排序实现。
当画图完成之后,可以将oncoPrint的结果赋值,并且使用draw函数,其中annotation_legend_side 参数可以将图例放在下面,让图片整体美观一些。
pdf("tcga2016result1.pdf") ##导出pdf文件
draw(oncoplot_tcga2016,annotation_legend_side = "bottom")
dev.off()
CGGA的这个网站里提供了一些他们的数据和代码,以及生成的图片,大家可以取数据下来尝试一下
http://www.cgga.org.cn/analyse/WEseq-data-oncoprint-result.jsp
参考网址:
https://jokergoo.github.io/ComplexHeatmap-reference/book/oncoprint.html
https://mp.weixin.qq.com/s/8kz2oKvUQrCR2_HWYXQT4g
网友评论