美文网首页R语言做图R语言作图
照葫芦画图之统计描述(一)

照葫芦画图之统计描述(一)

作者: 生信宝库 | 来源:发表于2022-03-03 11:08 被阅读0次

    说在前面

    简单说对一切实验结果分析的核心就是数据,当我们面对原始的大量数据时,这些数据中很可能夹杂着没有任何意义或者意义模糊的数据,我们很难从中发现有用的信息。

    为了快速且有效地认识数据蕴含的有效信息,我们需要经过分析处理进行简化,将一些复杂的数据,减少为几个代表性的数据,进而可以有助于我们大概掌握数据的整体情况,这时就需要使用统计描述了。

    基于展示统计描述特征的图有很多种类,今天我们就来开始介绍统计描述第一步雀氏纸尿裤,一起来看看有哪些好看的图吧。


    代码实现

    首先第一步我们就得需要了解要分析的数据属于哪一种分布。

    rm(list=ls())
    
    library("ggpubr")
    set.seed(1234)
    wdata = data.frame(
      sex = factor(rep(c("F", "M"), each=200)),
      weight = c(rnorm(200, 55), rnorm(200, 58)))
    head(wdata, 4)
    
    ggdensity(wdata, x = "weight",
              add = "mean", rug = TRUE,
              color = "sex", fill = "sex",
              palette = c("#00AFBB", "#E7B800"))
    
    gghistogram(wdata, x = "weight",
                add = "mean", rug = TRUE,
                color = "sex", fill = "sex",
                palette = c("#00AFBB", "#E7B800"))
    

    对应的图表我们就可以做成密度图和柱形图

    图片 图片

    从图中我们可以看出这两组数据都是大致属于正态分布。

    而在实际展示中,我们往往需要更丰富多彩的方式,比如更多分组,上下调关系,均可以通过下面代码来实现。

    # Load data
    data("mtcars")
    dfm <- mtcars
    # Convert the cyl variable to a factor
    dfm$cyl <- as.factor(dfm$cyl)
    # Add the name colums
    dfm$name <- rownames(dfm)
    # Inspect the data
    head(dfm[, c("name", "wt", "mpg", "cyl")])
    
    ggbarplot(dfm, x = "name", y = "mpg",
              fill = "cyl",               # change fill color by cyl
              color = "white",            # Set bar border colors to white
              palette = "jco",            # jco journal color palett. see ?ggpar
              sort.val = "desc",          # Sort the value in dscending order
              sort.by.groups = FALSE,     # Don't sort inside each group
              x.text.angle = 90           # Rotate vertically x axis texts
    )
    
    ggbarplot(dfm, x = "name", y = "mpg",
              fill = "cyl",               # change fill color by cyl
              color = "white",            # Set bar border colors to white
              palette = "jco",            # jco journal color palett. see ?ggpar
              sort.val = "asc",           # Sort the value in dscending order
              sort.by.groups = TRUE,      # Sort inside each group
              x.text.angle = 90           # Rotate vertically x axis texts
    )
    
    # Calculate the z-score of the mpg data
    dfm$mpg_z <- (dfm$mpg -mean(dfm$mpg))/sd(dfm$mpg)
    dfm$mpg_grp <- factor(ifelse(dfm$mpg_z < 0, "low", "high"), 
                          levels = c("low", "high"))
    # Inspect the data
    head(dfm[, c("name", "wt", "mpg", "mpg_z", "mpg_grp", "cyl")])
    ggbarplot(dfm, x = "name", y = "mpg_z",
              fill = "mpg_grp",           # change fill color by mpg_level
              color = "white",            # Set bar border colors to white
              palette = "jco",            # jco journal color palett. see ?ggpar
              sort.val = "asc",           # Sort the value in ascending order
              sort.by.groups = FALSE,     # Don't sort inside each group
              x.text.angle = 90,          # Rotate vertically x axis texts
              ylab = "MPG z-score",
              xlab = FALSE,
              legend.title = "MPG Group"
    )
    
    ggbarplot(dfm, x = "name", y = "mpg_z",
              fill = "mpg_grp",           # change fill color by mpg_level
              color = "white",            # Set bar border colors to white
              palette = "jco",            # jco journal color palett. see ?ggpar
              sort.val = "desc",          # Sort the value in descending order
              sort.by.groups = FALSE,     # Don't sort inside each group
              x.text.angle = 90,          # Rotate vertically x axis texts
              ylab = "MPG z-score",
              legend.title = "MPG Group",
              rotate = TRUE,
              ggtheme = theme_minimal()
    )
    
    ggdotchart(dfm, x = "name", y = "mpg",
               color = "cyl",                                # Color by groups
               palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
               sorting = "ascending",                        # Sort value in descending order
               add = "segments",                             # Add segments from y = 0 to dots
               ggtheme = theme_pubr()                        # ggplot2 theme
    )
    
    ggdotchart(dfm, x = "name", y = "mpg",
               color = "cyl",                                # Color by groups
               palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
               sorting = "descending",                       # Sort value in descending order
               add = "segments",                             # Add segments from y = 0 to dots
               rotate = TRUE,                                # Rotate vertically
               group = "cyl",                                # Order by groups
               dot.size = 6,                                 # Large dot size
               label = round(dfm$mpg),                        # Add mpg values as dot labels
               font.label = list(color = "white", size = 9, 
                                 vjust = 0.5),               # Adjust label parameters
               ggtheme = theme_pubr()                        # ggplot2 theme
    )
    
    ggdotchart(dfm, x = "name", y = "mpg_z",
               color = "cyl",                                # Color by groups
               palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
               sorting = "descending",                       # Sort value in descending order
               add = "segments",                             # Add segments from y = 0 to dots
               add.params = list(color = "lightgray", size = 2), # Change segment color and size
               group = "cyl",                                # Order by groups
               dot.size = 6,                                 # Large dot size
               label = round(dfm$mpg_z,1),                        # Add mpg values as dot labels
               font.label = list(color = "white", size = 9, 
                                 vjust = 0.5),               # Adjust label parameters
               ggtheme = theme_pubr()                        # ggplot2 theme
    )+
      geom_hline(yintercept = 0, linetype = 2, color = "lightgray")
    
    ggdotchart(dfm, x = "name", y = "mpg",
               color = "cyl",                                # Color by groups
               palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
               sorting = "descending",                       # Sort value in descending order
               rotate = TRUE,                                # Rotate vertically
               dot.size = 2,                                 # Large dot size
               y.text.col = TRUE,                            # Color y text by groups
               ggtheme = theme_pubr()                        # ggplot2 theme
    )+
      theme_cleveland()                                      # Add dashed grids
    
    图片 图片 图片 图片 图片 图片

    小结

    本小节,Immugent只是介绍了一些基本的统计描述的图表,但是实际使用中,我们往往需要更多具体的指标来进行对数据特征的展示。一般情况下,对于代表性的数据特征,我们一般需要从集中趋势(均值,中位值)和离散趋势(极差,方差)两方面进行描述。我们讲会在下一章节对这一部分内容进行讲解,敬请期待!


    收录于话题 #照葫芦画图系列

    8个

    上一篇照葫芦画图之统计描述(二)

    相关文章

      网友评论

        本文标题:照葫芦画图之统计描述(一)

        本文链接:https://www.haomeiwen.com/subject/gliyrrtx.html