美文网首页IMP researchR for statisticsresearch
R进行两因素重复测量方差分析并可视化(双组折线图)

R进行两因素重复测量方差分析并可视化(双组折线图)

作者: 欧阳松 | 来源:发表于2022-11-19 23:05 被阅读0次

    在仙桃学术上的生信工具里面,有一个折线图的绘图工具,可以很快速便捷的得出结论并可视化结果,当然不是说这个功能有多强大,而是统计学方法非常专业。
    比如用它自带的数据https://bioinfomatics.xiantao.love/biotools/data/demo/free/linePlot/%E6%8A%98%E7%BA%BF%E5%9B%BE.xlsx
    通过无脑式的鼠标点击,可得到下面一系列的结果。

    折线图
    可以看到有很多结果,包括各种统计学描述、各种格式的图片,以及一个demo.R

    比如统计描述

    组别1 组别2 数目 最小值 最大值 中位数(Median) 四分位距(IQR) 下四分位 上四分位 均值(Mean) 标准差(SD) 标准误(SE)
    Time1 Control 12 2.833 3.306 2.967 0.254 2.885 3.139 3.020 0.154 0.044
    Time1 Intervene 12 2.624 3.418 3.074 0.297 2.876 3.173 3.032 0.258 0.074
    Time2 Control 12 2.733 3.240 2.930 0.187 2.861 3.047 2.957 0.148 0.043
    Time2 Intervene 12 3.419 4.318 4.082 0.388 3.824 4.212 4.005 0.285 0.082
    Time3 Control 12 2.611 3.414 2.938 0.392 2.805 3.197 2.979 0.246 0.071
    Time3 Intervene 12 5.799 6.285 6.065 0.185 5.916 6.101 6.030 0.139 0.040

    又比如异常值分析、正态性检验(Shapiro-Wilk normality test)、方差齐性检验(Levene's test)、协方差同质假设(Homogeneity of covariances assumption/Box’s M-test)、球形假设(Mauchly's Test for Sphericity)、两因素单向重复测量数据方差分析(Two-way mixed ANOVA)、单独效应分析(Simple effect)、多重成对比较(Pairwise Comparisons of Estimated Marginal Means)等等专业术语,并且还给了详细的统计学结果。


    image.png image.png image.png

    这种统计学结果让大家很欣慰,那么这些结果都是如何计算出来的呢???


    我们可以点开demo.R这个结果,读一读代码
    https://bioinfomatics.xiantao.love/biotools/code/open/lineplot.R

    现在我们复现一下:

    加载需要的包并导入数据

    library(ggplot2)
    library(reshape2)
    library(car)
    library(rstatix)
    ## 导入数据,比如我们把下载的折线图数据保存在桌面
    library(readxl)
    data <- read_excel("~/Desktop/折线图.xlsx") ## mac系统代码
    data # 显示结果
    
    trt Time1 Time2 Time3
    Control 3.185968 2.875802 3.414224
    Control 2.832632 2.862111 2.801333
    Control 3.123533 2.984142 2.909648
    Control 2.880927 3.146924 3.256431
    Control 3.090936 2.732620 2.966887
    Control 2.920949 2.865287 2.820161
    Control 3.306013 2.856390 2.806807
    Control 2.885842 3.002480 2.611145
    Control 2.955919 3.239596 3.182478
    Control 2.978572 2.818567 2.734901
    Control 3.190647 3.042065 3.001664
    Control 2.883867 3.063226 3.240626
    Intervene 2.623973 3.771982 6.041055
    Intervene 2.663926 3.841685 6.098856
    Intervene 3.045286 4.301342 6.108293
    Intervene 3.054712 4.065593 6.085687
    Intervene 3.127857 4.097633 6.055522
    Intervene 3.147686 4.134947 5.926182
    Intervene 3.248095 3.419161 5.855298
    Intervene 3.326086 4.182629 5.885426
    Intervene 3.093890 4.318014 6.151859
    Intervene 3.418347 4.303615 6.284906
    Intervene 2.935396 3.986879 5.798966
    Intervene 2.697790 3.640084 6.073529

    新增一列id,id即为数字,用于后续分析,必不可少

    data$id <- 1:nrow(data)
    
    trt Time1 Time2 Time3 id
    Control 3.185968 2.875802 3.414224 1
    Control 2.832632 2.862111 2.801333 2
    Control 3.123533 2.984142 2.909648 3
    Control 2.880927 3.146924 3.256431 4
    Control 3.090936 2.732620 2.966887 5
    Control 2.920949 2.865287 2.820161 6
    Control 3.306013 2.856390 2.806807 7
    Control 2.885842 3.002480 2.611145 8
    Control 2.955919 3.239596 3.182478 9
    Control 2.978572 2.818567 2.734901 10
    Control 3.190647 3.042065 3.001664 11
    Control 2.883867 3.063226 3.240626 12
    Intervene 2.623973 3.771982 6.041055 13
    Intervene 2.663926 3.841685 6.098856 14
    Intervene 3.045286 4.301342 6.108293 15
    Intervene 3.054712 4.065593 6.085687 16
    Intervene 3.127857 4.097633 6.055522 17
    Intervene 3.147686 4.134947 5.926182 18
    Intervene 3.248095 3.419161 5.855298 19
    Intervene 3.326086 4.182629 5.885426 20
    Intervene 3.093890 4.318014 6.151859 21
    Intervene 3.418347 4.303615 6.284906 22
    Intervene 2.935396 3.986879 5.798966 23
    Intervene 2.697790 3.640084 6.073529 24

    将短数据转换为长数据

    data2 <- gather(data, key = "x", value = "value", -trt, -id)
    

    对各组进行统计学描述

    data3 <- data2 %>% 
      group_by(trt, x) %>% 
      get_summary_stats(value)
    
    trt x variable n min max median q1 q3 iqr mad mean sd se ci
    Control Time1 value 12 2.833 3.306 2.967 2.885 3.139 0.254 0.156 3.020 0.154 0.044 0.098
    Control Time2 value 12 2.733 3.240 2.930 2.861 3.047 0.187 0.137 2.957 0.148 0.043 0.094
    Control Time3 value 12 2.611 3.414 2.938 2.805 3.197 0.392 0.252 2.979 0.246 0.071 0.156
    Intervene Time1 value 12 2.624 3.418 3.074 2.876 3.173 0.297 0.232 3.032 0.258 0.074 0.164
    Intervene Time2 value 12 3.419 4.318 4.082 3.824 4.212 0.388 0.327 4.005 0.285 0.082 0.181
    Intervene Time3 value 12 5.799 6.285 6.065 5.916 6.101 0.185 0.097 6.030 0.139 0.040 0.088

    批量运行正态性检验(Shapiro-Wilk normality test)

    for(i in unique(data[,1])){
      data1 <- data[data[,1] == i,]
      print(lapply(data1[,-1], function(x) shapiro.test(x)))
    }
    

    $Time1

    Shapiro-Wilk normality test

    data: x

    W = 0.91913, p-value = 0.2788

    $Time2

    Shapiro-Wilk normality test

    data: x

    W = 0.88011, p-value = 0.08792

    $Time3

    Shapiro-Wilk normality test

    data: x

    W = 0.73897, p-value = 0.002055

    $id

    Shapiro-Wilk normality test

    data: x

    W = 0.95933, p-value = 0.7742

    批量运行方差齐性检验(Levene's Test)

    for(i in unique(data2$x)){
      data1 <- data2[data2$x == i,]
      print(leveneTest(value~trt, data = data1))
    }
    

    Levene's Test for Homogeneity of Variance (center = median)
    Df F value Pr(>F)
    group 1 1.5617 0.2245
    22
    Levene's Test for Homogeneity of Variance (center = median)
    Df F value Pr(>F)
    group 1 2.5864 0.122
    22
    Levene's Test for Homogeneity of Variance (center = median)
    Df F value Pr(>F)
    group 1 3.8375 0.06291 .
    22


    Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

    两因素重复测量方差分析

    anova_test(data = data2, dv = value, wid = id, within = x, between = trt) 
    

    ANOVA Table (type II tests)

    $ANOVA
    Effect DFn DFd F p p<.05 ges
    1 trt 1 22 510.657 1.02e-16 * 0.918
    2 x 2 44 391.826 9.19e-29 * 0.902
    3 trt:x 2 44 407.706 4.01e-29 * 0.905

    $Mauchly's Test for Sphericity
    Effect W p p<.05
    1 x 0.966 0.699
    2 trt:x 0.966 0.699

    $Sphericity Corrections
    Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
    1 x 0.968 1.94, 42.57 6.65e-28 * 1.059 2.12, 46.61 9.19e-29 *
    2 trt:x 0.968 1.94, 42.57 2.98e-28 * 1.059 2.12, 46.61 4.01e-29 *

    可视化结果

    ggplot(data = data3, aes(x = x, y = mean, color = trt, group = trt)) +
      geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), color = "black", width = 0.2) +
      geom_line() +
      geom_point() +
      theme_bw()
    

    image.png

    当然这个图跟原图不一样,并且没有两两比较,我们可以使用ggpubrrstatix进行二次绘图。

    关于rstatix进行统计学分析,其实可以Repeated Measures ANOVA in R: The Ultimate Guide - Datanovia
    得到答案,具体的分析,我们暂不说了,只说如何进行统计学分析,并且自动两两比较

    res.aov <- anova_test(data = data2, dv = value, wid = id, within = x, between = trt) 
    # 组间两两比较
    pwc <- data2 %>%
        group_by(x) %>%
        pairwise_t_test(
            value ~ trt, paired = TRUE,
            p.adjust.method = "bonferroni" ## 校正方法,包括 "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".等
        )
    pwc
    
    x .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
    Time1 value Control Intervene 12 12 -0.1453235 11 8.87e-01 8.87e-01 ns
    Time2 value Control Intervene 12 12 -12.3908123 11 1.00e-07 1.00e-07 ****
    Time3 value Control Intervene 12 12 -40.9352857 11 0.00e+00 0.00e+00 ****

    可以看到已经自动进行了两两比较,我们使用rstatix包的add_xy_position()函数可以添加两两比较列表和x轴y轴位置。

     pwc <- pwc %>% add_xy_position(x = "x")# 按时间分列
    pwc
    
    x .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif y.position groups xmin xmax
    1 value Control Intervene 12 12 -0.1453235 11 8.87e-01 8.87e-01 ns 3.7834 Control , Intervene 0.8 1.2
    2 value Control Intervene 12 12 -12.3908123 11 1.00e-07 1.00e-07 **** 4.6834 Control , Intervene 1.8 2.2
    3 value Control Intervene 12 12 -40.9352857 11 0.00e+00 0.00e+00 **** 6.6504 Control , Intervene 2.8 3.2

    可视化绘图

    library(ggpubr)
    ggline(data2,x='x',y='value',
    color = 'trt', #按组配色
    add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
    palette = "aaas" # aaas杂志配色
    )+stat_pvalue_manual(pwc, tip.length = 0, hide.ns = TRUE) + # 添加两两比较,隐藏无意义
        labs(
            subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
            caption = get_pwc_label(pwc) # 右下角条件两两比较方法。
        )
    
    image.png

    当然,我们还可以继续修改图片,比如全部显示结果,取消显著性标记的下划线,或者显示具体的p值等

    ggline(data2,x='x',y='value',
           color = 'trt', #按组配色
           add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
           palette = "aaas",# aaas杂志配色
    ggtheme = theme_bw(), #背景
    xlab = F,ylab = "Score", #xy轴名称
    legend.title="Treatment" ,legend="top", #标签名称和位置
    title = "Comparison of two groups at different times" #标题
    )+stat_pvalue_manual(pwc, bracket.size = 0) + # 添加两两比较,显示所有结果
        labs(
            subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
            caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
        )
    
    image.png

    也可以显示数值

    ggline(data2,x='x',y='value',
           color = 'trt', #按组配色
           add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
           palette = "aaas",# aaas杂志配色#背景
           xlab = F,ylab = "Score", #xy轴名称
           legend.title="Treatment" ,legend="right",
           title = "Comparison of two groups at different times" #标题
    )+stat_pvalue_manual(pwc, bracket.size = 0,label = "p.adj") + # 添加两两比较,显示所有结果
        labs(
            subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
            caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
        )
    
    image.png

    相关文章

      网友评论

        本文标题:R进行两因素重复测量方差分析并可视化(双组折线图)

        本文链接:https://www.haomeiwen.com/subject/ebuvxdtx.html