可视化系列【六】：跟着Nature Communications

作者: Bio_Infor | 来源:发表于2023-08-25 18:54 被阅读0次

跟着Nature Communications 学画图~ggpl
跟着Nature Communications 学画图~ggpl
R语言基础绘图函数散点图~跟着Nature Communicat
跟着Nature Communications 学画图~ggpl
第一个跟着Nature Communications学画图系列合
跟着Nature Communications学画图~Figur
跟着Nature Communications学画图~Figur
跟着Nature Communications学画图
跟着Nature Communications学作图 -- 复杂
跟着Nature Communications学作图：synvi

不积跬步，无以至千里

本期我们尝试复现2023年7月7日发表在Nature Communications上的Molecular profiling of aromatase inhibitor sensitive and resistant ER+HER2- postmenopausal breast cancers文章中的Fig3a。

以下是原图：

数据可以自行下载，也可评论区留言我私发给你。

代码及分析

之前分享代码都没有讲代码在干什么，所以这次做一次尝试进行代码分析。

加载包

library(dplyr)
library(ggplot2)
library(magrittr)
library(tidyr)
library(vroom)
library(forcats)

数据读取

plotdata <- vroom(file = 'fig3a.csv', col_names = TRUE)

这里用到了vroom包的vroom()函数对数据进行读取，与传统的read.table()函数相比，其主要有两个优势：（1）读取速度更快；（2）自动判断文件分隔符。

查看一下数据：

head(plotdata)

# A tibble: 6 × 13
#  Description `GeneRatio PRs vs GRs` `NES PRs vs GRs` `p.adjust PRs vs GRs`
#  <chr>                        <dbl>            <dbl>                 <dbl>
#1 HALLMARK_A…                     70             2.78              8.33e-10
#2 HALLMARK_I…                     59             2.67              8.33e-10
#3 HALLMARK_E…                     45            -2.49              8.33e-10
#4 HALLMARK_E…                     50             2.16              8.33e-10
#5 HALLMARK_T…                     48             2.20              8.33e-10
#6 HALLMARK_I…                     53             2.18              8.33e-10

数据清洗

这里我只保留了文章中的PATHWAY，并且按照文章的顺序进行了排列，所以采用了原始粗暴的方式：

levels <- c(
  'E2F TARGETS', 'G2M CHECKPOINT', 'MITOTIC SPINDLE', 'MYC TARGETS V1', 'MYC TARGETS V2', 'P53 PATHWAY',
  'ESTROGEN RESPONSE EARLY', 'ESTROGEN RESPONSE LATE', 'IL2 STAT5 SIGNALING', 'KRAS SIGNALING DN', 'KRAS SIGNALING UP', 'MTORC1 SIGNALING', 'TNFA SIGNALING VIA NFKB',
  'ALLOGRAFT REJECTION', 'COMPLEMENT', 'IL6 JAK STAT3 SIGNALING', 'INFLAMMATORY RESPONSE', 'INTERFERON ALPHA RESPONSE', 'INTERFERON GAMMA RESPONSE',
  'HYPOXIA',
  'EPITHELIAL MESENCHYMAL TRANSITION',
  'GLYCOLYSIS',
  'APICAL JUNCTION'
)
colors <- c(
  rep('#279D77', 6),
  rep('#CF6611', 7),
  rep('#7974A1', 6),
  '#E2348E',
  '#E5B63D',
  '#91793C',
  '#686868'
)

这里的颜色是用FastStone软件吸取的。

接下来，我只对原文件plotdata进行了过滤，只保留了这些PATHWAY.

plotdata <- plotdata %>% 
  pivot_longer(cols = starts_with('NES'), names_to = 'Compare', values_to = 'NES') %>% 
  pivot_longer(cols = starts_with('GeneRatio'), names_to = 'GeneRatio_Group', values_to = 'Ratio') %>% 
  pivot_longer(cols = starts_with('p.adjust'), names_to = 'p.adjust_Group', values_to = 'p.adjust') %>% 
  mutate(Description = gsub(pattern = 'HALLMARK_', replacement = '', x = Description)) %>% 
  mutate(Description = gsub(pattern = '_', replacement = ' ', x = Description)) %>%
  filter(Description %in% levels) %>% 
  mutate(Description = fct_relevel(Description, rev(levels))) %>% 
  mutate(Compare = gsub(pattern = 'NES ', replacement = '', x = Compare)) %>% 
  mutate(Compare = fct_relevel(Compare, c('PRs vs GRs', 'PRs ESR1 HIGH vs GRs', 'PRs ESR1 LOW vs GRs', 'PRs ESR1 LOW vs PRs ESR1 HIGH')))

这里有几个知识点；

i. pivot_longer()用来改变数据框结构，对应的还有pivot_wider()，都是tidyr包里的函数，可以自主学习一下；

ii. 用gsub()函数对特定的字符进行替换；

iii. 用fct_relevel()函数来对数据框中的列进行因子化，来自于forcats包，可以自主学习一下。

现在这个数据已经变成这样了：

head(plotdata)

# A tibble: 6 × 7
#  Description   Compare   NES GeneRatio_Group Ratio p.adjust_Group p.adjust
#  <fct>         <fct>   <dbl> <chr>           <dbl> <chr>             <dbl>
#1 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    70 p.adjust PRs … 8.33e-10
#2 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    70 p.adjust PRs … 1.67e- 9
#3 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    70 p.adjust PRs … 1   e- 9
#4 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    70 p.adjust PRs … 1   e- 9
#5 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    60 p.adjust PRs … 8.33e-10
#6 ALLOGRAFT RE… PRs vs…  2.78 GeneRatio PRs …    60 p.adjust PRs … 1.67e- 9

可视化

分析这个图，有几个关键点：

i. 首先这里肯定有分面，所以一定会用到facet_*函数；

ii. GeneRatio映射到了点的大小上；

iii. p value映射到了点的颜色上；

iiii. 点和x = 0之间有连线，这也就是我们常说的棒棒糖图。

所以先简单可视化一下：

p1 <- plotdata %>% 
  ggplot(aes(x = NES, y = Description)) +
  geom_vline(xintercept = 0, color = 'grey', linewidth = 1) +
  geom_segment(aes(x = 0, xend = NES, y = Description, yend = Description), color = 'grey', linewidth = 1) +
  geom_point(aes(color = -log10(p.adjust), size = Ratio)) +
  scale_color_gradient(low = 'blue', high = 'red', breaks = seq(3, 9, 2), limits = c(1, 9), name = 'Significance\n(-log10 FDR)') +
  scale_size_continuous(name = 'GeneRatio(%)', range = c(2, 5), limits = c(30, 60)) +
  scale_x_continuous(limits = c(-3, 3), breaks = seq(-3, 3, 1), expand = c(0, 0)) +
  labs(x = 'Normalized Enrichment Score (NES)', y = NULL) +
  theme_bw() +
  theme(
    panel.grid = element_blank(),
    panel.border = element_rect(linewidth = 1.5, color = 'black'),
    panel.spacing.x = unit(0.15, units = 'in'),
    strip.background = element_rect(linewidth = 1.5, color = 'black'),
    axis.ticks = element_line(color = 'black'),
    axis.text.x = element_text(family = 'sans', colour = 'black', size = 11),
    axis.text.y = element_text(family = 'sans', size = 11, face = 'bold', color = rev(colors)),
    axis.title.x = element_text(family = 'sans', color = 'black', size = 14),
    legend.title = element_text(family = 'sans', color = 'black', face = 'bold', size = 12),
    legend.text = element_text(family = 'sans', color = 'black', face = 'bold', size = 11)
  ) +
  facet_wrap(~Compare, ncol = 4)

到这一步：

解释一下部分参数：

（1）geom_vline()添加竖线，对应的geom_hline()添加横线;

（2）geom_segment()添加棒棒糖的“棒棒”；

（3）scale_color_gradient()与scale_size_continuous()更改映射详细内容，如颜色，大小范围；

（4）scale_x_continuous()对x轴坐标进行个性化修改，包括范围，断点；

（5）theme()中一系列的参数：panel改变图形背景，其中grid去掉了背景中的线条，border改变背景边框，spacing.x改变图形左右两侧空白的大小，防止分面图形相隔太近；strip改变分面属性；axis与legend改变坐标、图例文字等信息。

截至目前，我们还有一个关键的问题没有解决：原图中有不同的方块儿背景色，而我们还没有，所以现在需要完成这个事情，这也是这篇帖子最重要也是最难的一部分。

为了完成这个任务，我选择geom_rect()函数，但是这里有一个问题，上面的图中纵坐标为一个离散值，所以不可避免的，两个相邻矩形之间一定会存在空隙，所以这里我们要想办法把离散值转换为连续值，这样的话就能实现两个矩形之间的“无缝衔接”。

这里我们用一个新的数据框作为geom_rect()函数的输入，这个数据框是这样的：

rect.data <- data.frame(
  ymin = c(0, 1.5, 2.5, 3.5, 4.5, 10.5, 17.5),
  ymax = c(1.5, 2.5, 3.5, 4.5, 10.5, 17.5, 23.5),
  colors = letters[1:7]
)
head(rect.data)

#  ymin ymax colors
#1  0.0  1.5      a
#2  1.5  2.5      b
#3  2.5  3.5      c
#4  3.5  4.5      d
#5  4.5 10.5      e
#6 10.5 17.5      f

现在可以添加矩形框了：

p1 + 
  geom_rect(data = rect.data, aes(xmin = -Inf, xmax = Inf, ymin = ymin, ymax = ymax, fill = colors), inherit.aes = FALSE, alpha = 0.2, show.legend = FALSE) +
  scale_fill_manual(values = c('#686868', '#91793C', '#E5B63D', '#E2348E', '#7974A1', '#CF6611', '#279D77')) +
  guides(color = guide_colorbar(order = 1), size = guide_legend(order = 2))

出图：

这里有两个点需要强调一下：

（1）geom_rect()一定要加inherit.aes = FALSE，如果不加就会报错，因为它所继承的映射在这个数据框中没有相应的元素；

（2）使用guides()函数改变图例顺序，否则是和原文不一样的。

写在最后

其它与原图不符的我会选择直接在Adobe Illustrator进行更改，更加方便，在代码方面就不赘述了。

跟着Nature Communications 学画图~ggpl
今天继续跟着Nature Communications学画图系列最后一篇。学习R语言ggplot2包画箱线图。...
跟着Nature Communications 学画图~ggpl
今天继续跟着Nature Communications学画图系列第五篇。学习R语言ggplot2包画图。然后多个...
R语言基础绘图函数散点图~跟着Nature Communicat
今天继续跟着Nature Communications学画图系列第二篇。学习R语言基础绘图函数画散点图。对应的...
跟着Nature Communications 学画图~ggpl
今天继续跟着Nature Communications学画图系列第四篇。学习R语言ggplot2包画散点图，然后...
第一个跟着Nature Communications学画图系列合
第一个跟着Nature Communications学画图系列完结。涉及到的内容包括，基础绘图函数画散点图、...
跟着Nature Communications学画图~Figur
今天继续跟着Nature Communications学画图系列第三篇。学习R语言ggplot2包画箱线图。对...
跟着Nature Communications学画图~Figur
今天开始跟着Nature Communications学画图系列第一篇。学习用R语言的基础绘图函数画箱线图。 ...
跟着Nature Communications学画图
最早是在小丫画图公众号发现的这个推文这套代码，完全复现这篇Nature Communications，当时点...
跟着Nature Communications学作图 -- 复杂
从这个系列开始，师兄就带着大家从各大顶级期刊中的Figuer入手，从仿照别人的作图风格到最后实现自己游刃有余的套用...
跟着Nature Communications学作图：synvi
synvisio在线工具链接 https://synvisio.github.io/[https://synvi...