美文网首页基因组数据绘图ggplot2绘图
误差线(Error bars)和显著性标记

误差线(Error bars)和显著性标记

作者: 吴十三和小可爱的札记 | 来源:发表于2019-11-07 20:12 被阅读0次

    1. 简介

    误差线(Error bars)是数据变异性的图形显示,用于表明被测值的误差或不确定性。它指示了被测值的准确性、或被测值和真实值的差异性。可用来表示standard deviationstandard errorconfidence interval,由于它们的值不尽相同,所以需要在图中特别说明。

    标准差(sd)是描述性统计里用来表示数据本身均值范围的,sd越小,表明数据越集中在均值附近。标准差与均数结合估计参考值范围,计算变异系数,计算标准误。标准差(sd)不牵扯均值对比推测,仅仅是描述性的。

    标准误(se)表示样本平均数对总体平均数的变异程度,反映抽样误差的大小;标准误(se)用于预测样本数据准确性 ,标准误越小,样本均值和总体均值差距越小,样本数据越能代表总体数据。

    2. 作图

    基于ggplot2的扩展包,ggpubr
    ggsignif 可以很简便的做出相应的图,但代价是作图过程中很多内部数据变换过程被封装起来了,不利于学习。

    require(tidyverse)
    
    #以species为分组变量,求petal.wdith 的sd,se和ci
    
    conf.interval = 0.95
    data <- iris %>% 
      group_by(Species) %>% 
      summarise(mean = mean(Petal.Width, na.rm = T),
                sd = sd(Petal.Width, na.rm = T),
                N = length(Petal.Width),
                se = sd/sqrt(N),
                ciMult = qt(conf.interval/2 + 0.5, N - 1),
                ci = se * ciMult)
    
    #不做统计变换的柱状图
    col_plot <- ggplot() + 
      geom_col(data = data, aes(x = Species, 
                                y = mean, fill = Species), width = 0.5)
    
    #误差线
    p <- col_plot + geom_errorbar(data = data, aes(x = Species, 
                                            ymin = mean-se, 
                                            ymax = mean + se, 
                                            group = Species,
                                            width = 0.2), 
                           position = position_dodge(width = 0.8))
    
    col_plot.png

    例2

    # 计算均值和sd
    require(tidyverse)
    set.seed(13)
    set.seed(13)
    data <- diamonds %>% 
      sample_n(1000) 
    
    sum_data <- data %>% 
      group_by(cut) %>% 
      summarise(
        mean_price = mean(price ),
        sd_price   = sd(price )) 
    # 作图
    p <- ggplot(data = sum_data, aes(x = cut, y = mean_price)) +
      geom_col(aes(fill = cut), color = "black", width = 0.85) +
      geom_errorbar(aes(ymin = mean_price - sd_price,
                        ymax = mean_price + sd_price),
                    color = "#22292F",
                    width = .1) +
      labs(
        y = "Mean Price",
        title = "Mean price in Different cuts",
        caption = "Error bars indicate standard deviations"
      ) 
    
    Rplot.png

    方差分析

    用agricolae进行多重比较。

    require(agricolae)
    cut.aov <- aov(price ~ cut, data =data)
    LSD <- LSD.test(cut.aov, "cut", p.adj="bonferroni")$groups
    cut <- rownames(LSD)
    LSD <- cbind(cut, LSD)
    data_c <- merge(sum_data, LSD, by = "cut")
    
    p + geom_text(data = data_c , aes(x = cut, 
                                      y = 2*mean_price + sd_price, 
                                      label = groups), 
                  position = position_dodge(0.9), 
                  size = 5, fontface = "bold")+ 
      labs(
      caption = "Barchart with Significance Tests"
    ) 
    
    Rplot01.png

    两两比较

    # 跟据最高y设定连线坐标
    sign <- tibble(
      x = c("Fair", "Fair", "Very Good", "Very Good"),
      y = c(8200,8400,8400,8200))
            
    p + geom_line(data =sign, 
                  aes(x = x, y = y, group = 1)) +
      annotate("text", x = 2, y = 8600,  # 跟据x,y设定星号坐标
                                                          # x = 2, 来源于cut 转换为了levels
               label = "***",
               size = 8, color = "#22292F")
    
    Rplot02.png

    相关文章

      网友评论

        本文标题:误差线(Error bars)和显著性标记

        本文链接:https://www.haomeiwen.com/subject/nptbbctx.html