美文网首页RR作图
可视化神器ggstatsplot = 绘图+统计

可视化神器ggstatsplot = 绘图+统计

作者: 程凉皮儿 | 来源:发表于2020-02-12 21:57 被阅读0次

    ggstatsplotggplot2包的扩展,主要用于绘制可发表的图片同时标注统计学分析结果,其统计学分析结果包含统计分析的详细信息,该包对于经常需要做统计分析的科研工作者来说非常有用。
    ggstatsplot在统计学分析方面的优势:

    • 目前它支持最常见的统计测试类型:t-test / anova,非参数,相关性分析,列联表分析和回归分析。
    • 在图片输出方面也表现出色:
      (1)小提琴图(用于不同组之间连续数据的异同分析);
      (2)饼图(用于分类数据的分布检验);
      (3)条形图(用于分类数据的分布检验);
      (4)散点图(用于两个变量之间的相关性分析);
      (5)相关矩阵(用于多个变量之间的相关性分析);
      (6)直方图和点图/图表(关于分布的假设检验);
      (7)点须图(用于回归模型)。

    以下是一些实用的例子:

    ggbetweenstats函数

    可创建小提琴图,箱线图或两者的混合,主要用于组间或不同条件之间的连续数据的比较, 最简单的函数调用如下所示:

    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    set.seed(123)
    
    ggstatsplot::ggbetweenstats(
      data = iris,
      x = Species,
      y = Sepal.Length,
      messages = FALSE
    ) + # further modification outside of ggstatsplot
      ggplot2::coord_cartesian(ylim = c(3, 8)) +
      ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))
    

    结果如下图所示:

    图1
    如果在加载包的时候不同时加载ggplot2
    便会出现如下报错:
    Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
      polygon edge not found
    

    从图1我们可以看出不同种类的iris在 Sepal.Length上有显著差异。但是其实我们可以修改参数,让其看起来更加富有信息。

    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    set.seed(123)
    # 去掉一列,舍弃anova检测看是否有t-test的结果
    iris2 <- dplyr::filter(.data = iris, Species != "setosa")
    
    iris2$Species <-
      base::factor(
        x = iris2$Species,
        levels = c("virginica", "versicolor")
      )
    # plot
    ggstatsplot::ggbetweenstats(
      data = iris2,
      x = Species,
      y = Sepal.Length,
      notch = TRUE, # show notched box plot
      mean.plotting = TRUE, # whether mean for each group is to be displayed
      mean.ci = TRUE, # whether to display confidence interval for means
      mean.label.size = 2.5, # size of the label for mean
      type = "p", # which type of test is to be run
      k = 3, # number of decimal places for statistical results
      outlier.tagging = TRUE, # whether outliers need to be tagged
      outlier.label = Sepal.Width, # variable to be used for the outlier tag
      outlier.label.color = "darkgreen", # changing the color for the text label
      xlab = "Type of Species", # label for the x-axis variable
      ylab = "Attribute: Sepal Length", # label for the y-axis variable
      title = "Dataset: Iris flower data set", # title text for the plot
      ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme
      ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
      package = "wesanderson", # package from which color palette is to be taken
      palette = "Darjeeling1", # choosing a different color palette
      messages = FALSE
    )
    
    图2

    ggbetweenstats函数

    ggbetweenstats函数的功能几乎与ggwithinstats相同。

    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    set.seed(123)
    
    ggstatsplot::ggwithinstats(
      data = iris,
      x = Species,
      y = Sepal.Length,
      messages = FALSE
    )
    
    图3
    # plot
    ggstatsplot::ggwithinstats(
      data = iris,
      x = Species,
      y = Sepal.Length,
      sort = "descending", # ordering groups along the x-axis based on
      sort.fun = median, # values of `y` variable
      pairwise.comparisons = TRUE,
      pairwise.display = "s",
      pairwise.annotation = "p",
      title = "iris",
      caption = "Data from: iris",
      ggtheme = ggthemes::theme_fivethirtyeight(),
      ggstatsplot.layer = FALSE,
      messages = FALSE
    )
    
    图3

    ggscatterstats函数

    此函数使用ggExtra :: ggMarginal中的边缘直方图/箱线图/密度/小提琴/ densigram图创建散点图,并在副标题中显示统计分析结果:

    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    set.seed(123)
    ggstatsplot::ggscatterstats(
      data = ggplot2::msleep,
      x = sleep_rem,
      y = awake,
      xlab = "REM sleep (in hours)",
      ylab = "Amount of time spent awake (in hours)",
      title = "Understanding mammalian sleep",
      messages = FALSE
    )
    
    图4
    图4表达的是sleep_remawake存在相关性,其中X轴为sleep_remY轴为awake。该图中右侧和上方的直方图代表的是数据的分布。该段数据越多,其柱子越高。
    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    set.seed(123)
    
    # plot
    ggstatsplot::ggscatterstats(
      data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
      x = budget,
      y = rating,
      type = "robust", # type of test that needs to be run
      conf.level = 0.99, # confidence level
      xlab = "Movie budget (in million/ US$)", # label for x axis
      ylab = "IMDB rating", # label for y axis
      label.var = "title", # variable for labeling data points
      label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
      line.color = "yellow", # changing regression line color line
      title = "Movie budget and IMDB rating (action)", # title text for the plot
      caption = expression( # caption text for the plot
        paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")
      ),
      ggtheme = theme_bw(), # choosing a different theme
      ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
      marginal.type = "density", # type of marginal distribution to be displayed
      xfill = "#0072B2", # color fill for x-axis marginal distribution
      yfill = "#009E73", # color fill for y-axis marginal distribution
      xalpha = 0.6, # transparency for x-axis marginal distribution
      yalpha = 0.6, # transparency for y-axis marginal distribution
      centrality.para = "median", # central tendency lines to be displayed
      messages = FALSE # turn off messages and notes
    )
    
    图5

    ggbarstats柱状图

    ggbarstats函数主要用于展示不同组之间分类数据的分布问题。例如:A组患者中,男女的比例是否与B组患者中男女的比例存在异同。

    rm(list = ls())
    options(stringsAsFactors = F)
    library(ggstatsplot)
    library(ggplot2)
    library(hrbrthemes)
    set.seed(123)
    # plot
    ggstatsplot::ggbarstats(
      data = ggstatsplot::movies_long,
      main = mpaa,
      condition = genre,
      sampling.plan = "jointMulti",
      title = "MPAA Ratings by Genre",
      xlab = "movie genre",
      perc.k = 1,
      x.axis.orientation = "slant",
      ggtheme = hrbrthemes::import_roboto_condensed(),
      ggstatsplot.layer = FALSE,
      ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
      palette = "Set2",
      messages = FALSE
    )
    
    图6
    图6,堆积柱状图:比较的是不同组之间,分类数据的分布是否存在异同。同样可以修改参数让它显得更加复杂和美观。
    ggtheme = hrbrthemes::import_roboto_condensed()原始的参考文件不是这的而是ggtheme = hrbrthemes::theme_modern_rc()所以需要先加载hrbrthemes包,这个过程中容易出现报错
    Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
      polygon edge not found
    In addition: Warning message:
    In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
      no font could be found for family "Roboto Condensed"
    

    gghistostats

    看一个变量的分布并通过一个样本测试检查它是否与指定值明显有差异:

    ggstatsplot::gghistostats(
      data = ToothGrowth, # dataframe from which variable is to be taken
      x = len, # numeric variable whose distribution is of interest
      title = "Distribution of Sepal.Length", # title for the plot
      fill.gradient = TRUE, # use color gradient
      test.value = 10, # the comparison value for t-test
      test.value.line = TRUE, # display a vertical line at test value
      type = "bf", # bayes factor for one sample t-test
      bf.prior = 0.8, # prior width for calculating the bayes factor
      messages = FALSE # turn off the messages
    )
    
    图7

    ggdotplotstats

    此函数类似于gghistostats,当变量有数字标签是使用更佳。

    set.seed(123)
    
    # plot
    ggdotplotstats(
      data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),
      y = country,
      x = lifeExp,
      test.value = 55,
      test.value.line = TRUE,
      test.line.labeller = TRUE,
      test.value.color = "red",
      centrality.para = "median",
      centrality.k = 0,
      title = "Distribution of life expectancy in Asian continent",
      xlab = "Life expectancy",
      messages = FALSE,
      caption = substitute(
        paste(
          italic("Source"),
          ": Gapminder dataset from https://www.gapminder.org/"
        )
      )
    )
    
    图8

    ggcorrmat

    该函数主要用于变量之间的相关性分析:

    set.seed(123)
    # as a default this function outputs a correlalogram plot
    ggstatsplot::ggcorrmat(
      data = ggplot2::msleep,
      corr.method = "robust", # correlation method
      sig.level = 0.001, # threshold of significance
      p.adjust.method = "holm", # p-value adjustment method for multiple comparisons
      cor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selected
      cor.vars.names = c(
        "REM sleep", # variable names
        "time awake",
        "brain weight",
        "body weight"
      ),
      matrix.type = "upper", # type of visualization matrix
      colors = c("#B2182B", "white", "#4D4D4D"),
      title = "Correlalogram for mammals sleep dataset",
      subtitle = "sleep units: hours; weight units: kilograms"
    )
    
    图9

    ggcoefstats

    回归分析森林图展示点估计值带有置信区间的点:

    set.seed(123)
    
    # model
    mod <- stats::lm(
      formula = mpg ~ am * cyl,
      data = mtcars
    )
    
    # plot
    ggstatsplot::ggcoefstats(x = mod)
    
    图10

    除了以上的用内置数据完成的几类绘图,这个包还支持用其他包绘图,同时用ggstatsplot包展示统计分析结果:

    set.seed(123)
    
    # loading the needed libraries
    #install.packages("yarrr")
    library(yarrr)
    library(ggstatsplot)
    
    # using `ggstatsplot` to get call with statistical results
    stats_results <-
      ggstatsplot::ggbetweenstats(
        data = ChickWeight,
        x = Time,
        y = weight,
        return = "subtitle",
        messages = FALSE
      )
    # using `yarrr` to create plot
    yarrr::pirateplot(
      formula = weight ~ Time,
      data = ChickWeight,
      theme = 1,
      main = stats_results
    )
    
    图11
    参考学习资料:
    https://cloud.tencent.com/developer/article/1450100
    https://github.com/IndrajeetPatil/ggstatsplot

    相关文章

      网友评论

        本文标题:可视化神器ggstatsplot = 绘图+统计

        本文链接:https://www.haomeiwen.com/subject/kathfhtx.html