美文网首页科研信息学R生物统计
R语言ggstatsplot包做“卡方检验”

R语言ggstatsplot包做“卡方检验”

作者: Whuer_deng | 来源:发表于2019-07-23 07:59 被阅读295次
    library(ggstatsplot)
    library(ggplot2)
    library(dplyr)
    data("diamonds")
    
    diamonds2 <- diamonds %>% 
      filter(color == c('J', 'H', 'F'), clarity %in% c('SI2', 'VS1', 'IF'))#筛选出diamonds中颜色为J、H、F,分类为SI2、VS1、IF的数据,并保存为diamonds2。
    
    ggbarstats(diamonds2, color, clarity, palette = 'Set2')
    #以下为统计结果
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N          F      H      J      `Chi-squared`    df `p-value` significance
      <ord>     <chr>      <chr>  <chr>  <chr>          <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n = 1208) 45.20% 41.72% 13.08%         225.      2         0 ***         
    2 VS1       (n = 966)  46.38% 38.20% 15.42%         149.      2         0 ***         
    3 IF        (n = 251)  53.39% 39.44% 7.17%           84.6     2         0 ***   
    
    image.png
    如图所示,卡方值为15.01,p = 0.005 < 检验水准0.05,可认为钻石的颜色与分类不独立,即有差别。各个clarity的组内比较,不同颜色钻石的数量的差异均具有显著性(每个柱子上面为三颗星“***”,卡方值分别为225, 149, 84.6,均大于卡方分布在自由度为2,阿尔法为0.05时的值5.99,即p < 0.05, 所以都具有显著性)。
    ggpiestats(diamonds2, color, clarity, palette = 'Set3')
    #以下为统计结果
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N          F      H      J      `Chi-squared`    df `p-value` significance
      <ord>     <chr>      <chr>  <chr>  <chr>          <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n = 1208) 45.20% 41.72% 13.08%         225.      2         0 ***         
    2 VS1       (n = 966)  46.38% 38.20% 15.42%         149.      2         0 ***         
    3 IF        (n = 251)  53.39% 39.44% 7.17%           84.6     2         0 ***         
    
    image.png
    此图统计结果与上面柱状图的结果一样,只是将柱状图换成饼图。
    这种些图形能够方便快速的将统计数据快速可视化,不仅能得到基本的卡方统计量,P值,还可以得到各分组内的分布状况,如颜色为J的钻石在分类为SI2的组内占比为13%,占比最大的为颜色F,占比45%。在分类VS1和IF组内,占比最大的也是颜色F,分别占比46%和53%。
    grouped_ggpiestats(diamonds2[diamonds2$cut != 'Very Good',], color, clarity, grouping.var = cut, simulate.p.value = T)  #diamonds2[diamonds2$cut != 'Very Good',]表示去掉数据中cut为Very Good的数据,simulate.p.value = T表示对P值进行调整,因为cut为Fair的数据内,颜色为J和H的数量为0。
    #以下为统计结果
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N     F     H     J     `Chi-squared`    df `p-value` significance
      <ord>     <chr> <chr> <chr> <chr>         <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n =~ 47.7~ 41.7~ 10.4~          16.1     2     0     ***         
    2 VS1       (n =~ 42.8~ 35.7~ 21.4~           2       2     0.368 ns          
    3 IF        (n =~ 100.~ NA    NA              6       2     0.05  ns          
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N     F     H     J     `Chi-squared`    df `p-value` significance
      <ord>     <chr> <chr> <chr> <chr>         <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n =~ 49.6~ 35.7~ 14.6~         25.6      2     0     ***         
    2 VS1       (n =~ 48.1~ 31.3~ 20.4~          9.71     2     0.008 **          
    3 IF        (n =~ 69.2~ 15.3~ 15.3~          7.54     2     0.023 *           
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N     F     H     J     `Chi-squared`    df `p-value` significance
      <ord>     <chr> <chr> <chr> <chr>         <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n =~ 44.5~ 42.0~ 13.3~         71.7      2     0     ***         
    2 VS1       (n =~ 41.5~ 41.5~ 16.8~         29.6      2     0     ***         
    3 IF        (n =~ 40.0~ 48.0~ 12.0~          5.36     2     0.069 ns          
    Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
    
    Note: Results from one-sample proportion tests for each level of the variable
    clarity testing for equal proportions of the variable color.
    
    # A tibble: 3 x 9
      condition N     F     H     J     `Chi-squared`    df `p-value` significance
      <ord>     <chr> <chr> <chr> <chr>         <dbl> <dbl>     <dbl> <chr>       
    1 SI2       (n =~ 45.4~ 44.6~ 9.91%          84.7     2         0 ***         
    2 VS1       (n =~ 49.0~ 38.5~ 12.5~          84.7     2         0 ***         
    3 IF        (n =~ 52.5~ 42.3~ 5.08%          66.3     2         0 ***  
    
    image.png

    相关文章

      网友评论

        本文标题:R语言ggstatsplot包做“卡方检验”

        本文链接:https://www.haomeiwen.com/subject/imenlctx.html