美文网首页
ggplot | 数据分布可视化

ggplot | 数据分布可视化

作者: 生命数据科学 | 来源:发表于2023-01-01 19:05 被阅读0次

    在生物信息数据分析中,了解每个样本的数据分布对于选择分析流程和分析方法是很有帮助的,而如何更加直观、有效地画出数据分布图,是值得思考的问题

    1. 所需要的包

    library(ggdist)
    library(tidyquant)
    library(tidyverse)
    library(ggsci)
    

    2. 常规作图

    比较常见的数据分布图绘制主要为箱线图和小提琴图

    1.1 示例数据

    示例数据为ggplot2包自带数据,用到的是分类变量cyl,连续变量``

    > head(mpg)
    # A tibble: 6 × 11
      manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class  
      <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>  
    1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compact
    2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compact
    3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compact
    4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compact
    5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compact
    6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compact
    

    1.2 箱线图

    最简单的就是箱线图了,能够绘制出数据的离群值、四分位数和四分位间距

    # 箱线图
    library(ggplot2)
    library(ggsci)
    
    p = ggplot(mpg, aes(x=factor(cyl), y=cty,fill=factor(cyl))) + 
      geom_boxplot()+
      scale_fill_lancet()
    p
    # 依旧运用了ggsci包来填充颜色
    
    image

    1.3 小提琴图

    小提琴图相比于箱线图,能够多展示一个信息,数据密度

    # 同样的数据
    # 小提琴图
    library(ggplot2)
    library(ggsci)
    
    p = ggplot(mpg, aes(x=factor(cyl), y=cty,fill=factor(cyl))) + 
      geom_violin()+
      scale_fill_lancet()
    p
    # 依旧运用了ggsci包来填充颜色
    
    image

    3. 云雨图

    先看最终效果图吧~


    image

    图主要由3部分组成:

    1. 箱线图

    3.1 先画云

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) 
    
    image

    3.2 云+箱线图

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) 
    
    image

    3.3 云+雨+箱线图

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) +
      # Add dot plots from {ggdist} package
      ggdist::stat_dots(
        ## orientation to the left
        side = "left",
        ## move geom to the left
        justification = 1.1,
        ## adjust grouping (binning) of observations
        binwidth = .25
      ) 
    
    image

    3.4 方向不太对,颠倒一下

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) +
      # Add dot plots from {ggdist} package
      ggdist::stat_dots(
        ## orientation to the left
        side = "left",
        ## move geom to the left
        justification = 1.1,
        ## adjust grouping (binning) of observations
        binwidth = .25
      ) +coord_flip()
    
    image

    3.5 给它点颜色看看

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) +
      # Add dot plots from {ggdist} package
      ggdist::stat_dots(
        ## orientation to the left
        side = "left",
        ## move geom to the left
        justification = 1.1,
        ## adjust grouping (binning) of observations
        binwidth = .25
      ) +coord_flip()+
      # Adjust theme
      scale_fill_lancet() +
      scale_color_lancet()
    
    image

    3.6 去掉背景

    mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) +
      # Add dot plots from {ggdist} package
      ggdist::stat_dots(
        ## orientation to the left
        side = "left",
        ## move geom to the left
        justification = 1.1,
        ## adjust grouping (binning) of observations
        binwidth = .25
      ) +coord_flip()+
      # Adjust theme
      scale_fill_lancet() +
      scale_color_lancet()+
      theme_bw()+
      theme_classic()
    
    image

    3.7 可以再改改标题

     p <- mpg %>%
      filter(cyl %in% c(4,6,8)) %>%
      ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
      # add half-violin from {ggdist} package
      ggdist::stat_halfeye(
        ## custom bandwidth
        adjust = 0.5,
        ## move geom to the right
        justification = -.2,
        ## remove slab interval
        .width = 0,
        point_colour = NA
      ) +
      geom_boxplot(
        width = .15,
        ## remove outliers
        outlier.color = NA,
        alpha = 0.5
      ) +
      # Add dot plots from {ggdist} package
      ggdist::stat_dots(
        ## orientation to the left
        side = "left",
        ## move geom to the left
        justification = 1.1,
        ## adjust grouping (binning) of observations
        binwidth = .25
      ) +
      # Adjust theme
      scale_fill_lancet() +
      scale_color_lancet()+
      theme_bw()+
      theme_classic()+
      labs(title = "Raincloud_plot",
           x="cyl",
           fill="cyl_Type",color="cyl_Type")+
      coord_flip()
    p
    
    image

    基本上就大功告成啦,最后可以保存一下

    ggsave(p,filename = "raincloud_plot.jpg",height = 4,width = 5)
    

    4. 小结

    宝剑锋从磨砺出,梅花香自苦寒来

    画图其实就是这样,简单的图3句话就能写完,而要做得完美又好看,总是需要更大的高质量
    为什么我画图如此迅速呢?

    因为我有ggplot2的小抄~

    image
    基本上所有常见的图形对应的语法都有了,遇到各种图形需求也能游刃有余

    感谢观看,如果有用还请点赞,关注,在看,转发!

    相关文章

      网友评论

          本文标题:ggplot | 数据分布可视化

          本文链接:https://www.haomeiwen.com/subject/kdrxcdtx.html