美文网首页ggplot2绘图基因组数据绘图
高分文章复现系列1-瀑布图(有实操代码)

高分文章复现系列1-瀑布图(有实操代码)

作者: 沉迷工作的我 | 来源:发表于2024-03-05 16:36 被阅读0次

    今天看到一篇高分文章(IF=11.4,PMID :33546774)里的一副图很漂亮,忍不住想要复现一下,下面给小伙伴们分享下完整的复现过程。先看下原图和复现后的图对比吧!

    文章原图

    image.png

    复现后的图

    image.png
    1. 原文数据下载及整理

      把文章附件中的突变数据表下载下来,这里我已经下载好了,你们拿来用就行。


      image.png

      然后需要对数据格式进行整理,这一步我是通过R语言代码来实现的,具体代码如下:

    # 程序包安装---
    if(!"openxlsx" %in% installed.packages()){install.packages('openxlsx')}
    if(!"dplyr" %in% installed.packages()){install.packages('dplyr')}
    if(!"tidyverse" %in% installed.packages()){install.packages('tidyverse')}
    if(!"aplot" %in% installed.packages()){install.packages('aplot')}
    

    安装好程序包后,加载程序包(加载的时候注意每个程序包是否都安装成功,可以正常加载)。注意程序包版本!!!

    # 加载程序包,注意程序包版本!!!---
    library(openxlsx)
    library(dplyr)
    library(tidyverse)
    library(aplot) # 版本 0.2.2
    library(ggplot2) # 版本 3.2.0
    

    读取数据提取画图用到的基因,并计算基因突变比例。

    # 数据读取
    df <- read.xlsx("./40164_2021_200_MOESM2_ESM.xlsx",check.names = F)
    

    读取的数据格式如下:


    image.png
    # 整理画热图部分的数据
    gene_list <- c("CHD3","APC","TP53","PALB2",
                   "FANCA","TET2","DNMT3A","IDH2",
                   "ARID1A","ARID1B","MLL3","TYK2",
                   "STAT3","LRRK2","MAP2K1","PCLO",
                   "PIEZO1","FAT3","CSMD1","NSD1",
                   "MKI67","WDR90","MGA","CPS1",
                   "SPEN","ATP10B","ANKRD11","RELN",
                   "PLCG1","ALK","FLT4","RHOA",
                   "NOTCH1","NOTCH4")
    
    # 基因突变比例
    mut_per <- df %>% 
      dplyr::filter(Gene.Symbol %in% gene_list) %>% 
      group_by(PatientID,Gene.Symbol) %>% 
      summarise(n=n()) %>%
      group_by(Gene.Symbol,n) %>% 
      summarise(nsub=n(),per = nsub/53)
    
    mut_per <- aggregate(mut_per$per, by=list(type = mut_per$Gene.Symbol),sum) %>%
      dplyr::rename(per = 'x') %>%
      dplyr::rename(Gene.Symbol = 'type')
    
    heat_df <- df %>%
      # 挑选画图基因
      dplyr::filter(Gene.Symbol %in% gene_list) %>%
      dplyr::select(Gene.Symbol,PatientID,Function) %>%
      dplyr::rename(Alterations = Function) %>%
      merge(mut_per, ., by = 'Gene.Symbol') %>%
      dplyr::mutate(mut_pct = round(per * 100,0))
    

    整理后的数据格式heat_df如下:


    image.png

    对基因进行因子化,固定排列顺序,同时修改突变类型注释信息。

    # 因子化,固定基因排列顺序
    heat_df$Gene.Symbol <- factor(heat_df$Gene.Symbol,levels = gene_list)
    
    # 突变类型修改
    heat_df[heat_df$Alterations == 'cds-del', 'Alterations'] <- 'CDs-Indel'
    heat_df[heat_df$Alterations == 'missense', 'Alterations'] <- 'Missense'
    heat_df[heat_df$Alterations == 'nonsense', 'Alterations'] <- 'Nonsense'
    heat_df[heat_df$Alterations == 'frameshift', 'Alterations'] <- 'FrameShift'
    heat_df[heat_df$Alterations == 'splice-5' | heat_df$Alterations == "splice-3", 'Alterations'] <- 'Splicing'
    

    2. 画热图

    先画主体的热图,利用ggplot2来画图。根据基因通路分类和样本分组设置分割线,代码如下:

    p1 <- ggplot(heat_df, aes(factor(PatientID), fct_rev(Gene.Symbol)))+
      geom_tile(aes(fill = Alterations), color = 'white') +
      labs(x=NULL,y=NULL) +
      theme(axis.text.x = element_blank(),
            axis.ticks = element_blank(),
            axis.text.y = element_text(size = 20, face = 'italic'),
            legend.title = element_text(size = 25, face = 'bold'),
            legend.text = element_text(size = 20, face = 'italic'),
            panel.border = element_blank()) +
      scale_fill_manual(values = c('Missense' = '#103b75',
                                   'CDs-Indel' = '#fc9408',
                                   'Nonsense' = '#f2030d',
                                   'FrameShift' = '#1ffffd',
                                   'Splicing' = '#8e02e7')) +
      geom_hline(yintercept=c(3.5, 5.5, 7.5, 19.5, 21.5, 23.5, 29.5, 32.5),
                 size = 3,
                 color = 'white') +
      geom_vline(xintercept=c(30.5, 36.5, 40.5),
                 size = 3,
                 color = 'white')
    
    image.png

    3. 画右侧的通路注释信息

    这一步需要注意的是通过annotate来指定通路注释的位置!!!

    # 画右侧通路注释
    ann_df = ggplot() +
      annotate(geom = 'text', x = 0, y = 1.7, label = 'T-cell activation', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 4.5, label = 'RTK signaling pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 6.5, label = 'PI3K/AKT pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 13.5, label = 'Other signaling pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 20.5, label = 'MAPK pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 22.5, label = 'JAK-STAT pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 26.5, label = 'Epigenetic Remoldling', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 31, label = 'DNA repair/TP53 pathway', hjust = 0, size = 6, fontface='italic') +
      annotate(geom = 'text', x = 0, y = 33.5, label = 'APC(Wnt) pathway', hjust = 0, size = 6, fontface='italic') +
      xlim(0,0.02) +
      theme_bw() +
      theme(axis.text = element_blank(),
            axis.title = element_blank(),
            axis.ticks = element_blank(),
            panel.background = element_blank(),
            panel.border = element_blank(),
            panel.grid = element_blank())
    

    4. 画上面的样本突变柱状图

    # 上部条形图部分
    bar_df <- heat_df %>% count(PatientID, Alterations)
    p2 <- ggplot(bar_df, aes(factor(PatientID),n))+
      geom_bar(stat = "identity", aes(fill = Alterations))+
      scale_y_continuous(name = NULL,expand = c(0,0))+
      scale_x_discrete(name = NULL)+
      theme_classic()+
      theme(axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            axis.title = element_blank(),
            axis.line.x = element_blank(),
            axis.text.y = element_text(size = 20, face = 'italic'),
            axis.line.y = element_line(color = "black",size = 1.1),
            axis.ticks.y = element_line(color = "black",size = 1.1),
            legend.position = "none") +
      scale_y_continuous(expand = c(0.03,0.03)) +
      scale_fill_manual(values = c('Missense' = '#103b75',
                                   'CDs-Indel' = '#fc9408',
                                   'Nonsense' = '#f2030d',
                                   'FrameShift' = '#1ffffd',
                                   'Splicing' = '#8e02e7'))
    
    image.png

    5.画左侧基因突变频率柱状图

    # 突变频率注释
    bar_mut <- heat_df[!duplicated(heat_df$Gene.Symbol),]
    bar_mut$Gene.Symbol <- factor(bar_mut$Gene.Symbol, levels = gene_list)
    
    p3 <- ggplot(bar_mut, aes(fct_rev(Gene.Symbol),mut_pct))+
      geom_bar(stat = "identity")+
      coord_flip() +
      # scale_x_continuous(name = NULL,expand = c(0,0))+
      # scale_x_discrete(name = NULL)+
      theme_bw()+
      theme(axis.text.x = element_text(size = 20, face = 'italic'),
            axis.text.y = element_blank(),
            axis.ticks.y = element_blank(),
            panel.grid = element_blank(),
            axis.title = element_blank(),
            plot.title = element_text(size = 20, hjust = .5),
            panel.border = element_rect(size = 1.5),
            legend.position = "none") +
      labs(title = 'Mutations Percent', y = '', x = '')
    p3
    
    image.png

    6. 画底部的注释信息

    p4 <- ggplot() +
      annotate(geom = 'text', y = 0.01, x = 15, label = 'AITL', 
                size = 5, fontface ='bold', color = 'black') +
      annotate(geom = 'rect', xmin = 0, xmax = 30.2, ymin = 0,
               ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +
    
      annotate(geom = 'text', y = 0.01, x = 33, label = 'ALK- \nALCL', 
               size = 5, fontface ='bold', color = 'black') +
      annotate(geom = 'rect', xmin = 30.6, xmax = 36.2, ymin = 0, 
               ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +
    
      annotate(geom = 'text', y = 0.01, x = 38.5, label = 'ALK+ \nALCL', 
               size = 5, fontface = 'bold', color = 'black') +
      annotate(geom = 'rect', xmin = 36.6, xmax = 40.2, ymin = 0, 
               ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +
    
      annotate(geom = 'text', y = 0.01, x = 45, label = 'PTCL-NOS', 
               size = 5, fontface = 'bold', color = 'black') +
      annotate(geom = 'rect', xmin = 40.7, xmax = 50.5, ymin = 0, 
               ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +
      theme_bw() +
      theme(axis.text = element_blank(),
            axis.title = element_blank(),
            axis.ticks = element_blank(),
            panel.background = element_blank(),
            panel.border = element_blank(),
            panel.grid = element_blank())
    
    image.png

    7. 图片拼图

    最后就是把我们上述画的几张图拼接起来,我这里用的是aplot程序包进行拼图。然后保存图片即可。

    pic <- p1 %>% 
      insert_right(ann_df, width = 0.3) %>%
      insert_bottom(p4, height = 0.1) %>%
      insert_top(p2, height = 0.2) %>%
      insert_left(p3, width = 0.3)
    
      ggsave(plot = pic, device = 'png', dpi = 300,
           width = 19, height = 12, './heatmap2.png')
    

    完整的全部代码,文章数据我都整理放百度网盘了,私我-->获取百度网盘链接(永久有效)!

    相关文章

      网友评论

        本文标题:高分文章复现系列1-瀑布图(有实操代码)

        本文链接:https://www.haomeiwen.com/subject/kmsdzdtx.html