美文网首页石头剪刀布机器学习与计算机视觉R语言与统计分析
R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot

R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot

作者: TroyShen | 来源:发表于2019-12-27 14:27 被阅读0次

    目录

    • 0.问题引入
    • 1.示例数据
    • 2.计算样本数据序列的置信区间
    • 3.样本数据置信区间可视化数据框构建
    • 4.样本数据置信区间可视化(图1)
    • 5.在Barplot 中增加文字说明(图2)
    • 6.总结
    • 7.本篇所使用的软件包(没有的需要用install.packages('')进行安装)
    • 8.致谢

    0. 问题引入

    在我们处理地理/实验样本数据的时候,可能会需要对于典型点/典型样本的序列进行统计分析,体现在一定显著性水平下典型点/典型样本的统计特征,这时候我们面临以下两个技术细节:

    1. 如何计算样本序列的置信区间?
    2. 如何将样本的置信区间进行可视化?

    1. 示例数据

    本实验示例数据通过以下代码随机生成,可视化结果可能与下文示例图有出入。

    x = LETTERS[1:10]
    y = runif(1000, -10,10)
    colnames(y) = x
     head(y)
                 A          B         C          D
    [1,]  4.524860 -7.3445746 -6.439900  9.1539736
    [2,]  5.822885 -4.3644429 -7.866430 -0.3942861
    [3,] -2.204920 -7.4995754 -7.518962  4.4767627
    [4,] -4.915803 -0.4365128  7.768294  2.4638002
    [5,] -1.154929 -4.4947919 -8.646688  3.0954898
    [6,] -9.154364  5.4484339 -9.566622 -3.9255011
                  E           F           G
    [1,]  3.9060027  0.42257072 -2.48072582
    [2,] -3.0508460  0.08481233  3.92705319
    [3,] -9.7084660 -1.63433896  9.47799297
    [4,]  0.5258416 -5.27407052  0.03590268
    [5,] -0.9525900 -8.55815396  1.03945867
    [6,] -0.1960090 -1.94827378 -8.86757475
                 H          I         J
    [1,]  8.917407 -8.9758134 -1.422388
    [2,] -7.832482 -8.5410536  9.173685
    [3,] -8.502113  2.1070542 -5.903512
    [4,]  4.375888  0.5945806 -2.485896
    [5,] -9.202834 -8.4865285  6.049102
    [6,]  6.166935  0.3131027 -6.399058
    

    2. 计算样本数据序列的置信区间

    默认显著性水平p < 0.05

    y = matrix(y, ncol = 10)
    y_statistic = apply(y,2,CI)
    head(y_statistic)
                [,1]       [,2]       [,3]
    upper  1.0263200  0.8966174  1.1068900
    mean  -0.1453326 -0.3028532 -0.1041544
    lower -1.3169851 -1.5023239 -1.3151989
                [,4]       [,5]        [,6]
    upper  0.9076051  1.1043008  1.12495923
    mean  -0.2275783 -0.0576591  0.08110512
    lower -1.3627616 -1.2196190 -0.96274899
                [,7]       [,8]       [,9]
    upper  0.5812071  1.4890171  0.9575675
    mean  -0.5852506  0.3316813 -0.2092299
    lower -1.7517082 -0.8256545 -1.3760272
               [,10]
    upper  1.5280800
    mean   0.2473222
    lower -1.0334356
    

    3. 样本数据置信区间可视化数据框构建

    ymin = y_statistic[1,]
    ymean = y_statistic[2,]
    ymax = y_statistic[3,]
    
    pl_df = data.frame(x = x, y = ymean, ymin = ymin,ymax  =ymax)
    

    4. 样本数据置信区间可视化(图1)

    setwd('L:\\JianShu\\20191227')
    p = ggplot()+
      geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
               position = 'dodge',  stat = 'identity', width = 0.5)+
      geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
                    width = 0.2,
                    linetype = "solid",
                    position = position_dodge(width = 0.5),
                    color="black", size=0.8)+
      theme_bw()+
      theme(
        axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
        axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
        legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
        legend.title = element_blank(),
        legend.position = 'bottom',
        legend.direction = 'horizontal',
        legend.key.height=unit(0.1,"inches"),
        legend.key.width=unit(0.5,"inches")
      )+
      xlab('Sample Index')+
      ylab('Test Value')
      
    
    png('plot1.png',
        height = 10,
        width = 20,
        units = 'cm',
        res = 800)
    print(p)
    dev.off()
    
    图1 样本数据置信区间可视化

    根据图1, 我们可以看到A-J 样本的置信区间, 但是然后呢?是不是有点像 “多喝热水,少熬夜,早点睡。。”之类的有用的废话一样,完全从图1中看不出任何定量的信息?!

    原因是啥呢,因为图中信息太少了,我们需要加文字进去进行补充说明如何做请看第5节**


    5. 在Barplot 中增加文字说明(图2)

    如图2,这样就好多了嘛

    p2 = ggplot()+
      geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
               position = 'dodge',  stat = 'identity', width = 0.5)+
      geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
                    width = 0.2,
                    linetype = "solid",
                    position = position_dodge(width = 0.5),
                    color="black", size=0.8)+
      geom_text(data = pl_df, aes(x = x,y = y, label=round(y,2)),
              position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
      geom_text(data = pl_df, aes(x = x,y = ymin, label=round(ymin,2)),
                position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
      geom_text(data = pl_df, aes(x = x,y = ymax, label=round(ymax,2)),
                position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
      theme_bw()+
      theme(
        axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
        axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
        legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
        legend.title = element_blank(),
        legend.position = 'bottom',
        legend.direction = 'horizontal',
        legend.key.height=unit(0.1,"inches"),
        legend.key.width=unit(0.5,"inches"),
        text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5)
      )+
      xlab('Sample Index')+
      ylab('Test Value')
      
    
    png('plot2.png',
        height = 10,
        width = 20,
        units = 'cm',
        res = 800)
    print(p2)
    dev.off()
    
    图2 在Barplot 中增加文字说明

    6. 总结

    本篇主要解决了以下两个问题:

    1. 如何计算样本序列的置信区间?
    2. 如何将样本的置信区间进行可视化?

    7. 本篇所使用的软件包(没有的需要用install.packages('')进行安装)

    library('ggplot2')
    library('Rmisc')
    

    8. 致谢

    感谢大家的持续关注,小编会继续努力,持续更新下去的!

    大家如果觉得有用,还麻烦大家转发点赞加关注哈,也可以扩散到朋友圈,多谢大家啦~

    大家如果在使用本文代码的过程有遇到问题的,可以留言评论,也可以私信我哈~~


    小编联系方式

    相关文章

      网友评论

        本文标题:R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot

        本文链接:https://www.haomeiwen.com/subject/wwkdoctx.html