美文网首页
制作 table 1 的经验 (by stata and R)_

制作 table 1 的经验 (by stata and R)_

作者: liang_rujiang | 来源:发表于2019-12-04 19:04 被阅读0次

    INTRODUCTION

    Table 1 是描述研究对象基本信息的一张表,在各个研究中被经常使用。其中必须的部分是描述性统计(集中趋势、离散趋势、频数、频率),可选的部分是不同组之间的对比(ttest,F-test)。stata中有大量的包可以帮我们做到这一点,但是这些包的输出和偏医学类的结果表还是有点差别,需要手动调整。

    使用stata时,

    • 连续型变量,对象不分组,即只需要描述部分变量总的一个分布,情况比较简单。使用 tabstat varlist, c(s) s(mean sd)即可。
    • 连续型变量,对象分组时,也可以用上面的生成描述性部分,只需要加by()选项即可。
    • 分类型变量,不分组时,tab1 varlist
    • 分类型变量,分组时,tab1 varlist if grpvar == 1...
    • 连续型变量,ttest,F-test可以用foreach循环挨个来。
    • 分类型变量,chi-square,fisher可以用foreach循环和tab加选项来搞定。

    以上过程过于枯燥,且可能在抄写过程中带来差错,iebaltab可以在一定程度上帮助我们。请运行下面的例子观察。当然,首先安装该包ssc install ietoolkit

    一些特点:

    • 该包的遗憾是无法显示T-statistic,神奇的是,R 的tableone包也是这样的。但某些医学类或医学类相关的基础研究中常需要报告这个结果。
    • 对分类变量无效,出不来分类变量的描述性和chi-square,fisher结果(R的tableone可以应付chi-square)
    • 不区分连续型变量的分布和等方差假设(R的tableone可以应付分布情况,但需要用户自己先看一下分布,然后告诉函数,要对哪些变量使用非参数检验,检验方法也可选,我个人的用法是直接把感兴趣的连续型变量挑出来直接变形成长数据然后ggplot绘制分面图观察一下即可)
    • tableone 可以同时处理有分组和无分组的情况,同时处理描述和检验的结果,同时处理分类和连续变量,区分连续型变量的分布正态与否。
    • tableone使用时注意分类变量要么是以factor存在于dataframe中,要么是以numeric存在。以后面这种情况存在时,需要在函数中声明,否则会被当作连续型变量对待。

    总的来说,鄙人的经验是R的tableone包更为强大
    目前我的经验中,没有好的方法显示T-statistic,有懂得的大神欢迎评论区分享经验(我自己找到了,见下面,先安装包ssc install asdoc(14号凌晨四点半更新, t2docx看起来也好用,但只在15以上stata运行,我用14.1,无法测试,就不说了。))

    sysuse auto, clear
    asdoc, row(t-value)
    foreach i of varlist price-wei {
    ttest `i', by(for)
    asdoc, row(`r(t)')
    }
    

    一个完整且比较美好的例子

    sysuse auto, clear
    cap rm Myfile.doc
    asdoc tabstat price-we, by(for) stat(mean sd) dec(3)
    
    asdoc, row(t-value, p-value)
    foreach i of varlist price-wei {
    ttest `i', by(for)
    asdoc, row(`r(t)', `r(p)') dec(3)
    }
    

    输出如下

    图片.png
    稍微修改一下
    图片.png

    EXAMPLES WITH IEBALTAB IN STATA

    set more off
    sysuse auto, clear
    
    des
    fmiss
    drop rep78
    des
    
    iebaltab price headroom length, grpvar(foreign) save(temp) replace onerow pt std format(%7.2f)
    * onerow displays the number of observations in additional row at the bottom of the table if 
    * each group has the same number of observations for all variables in balancevarlist.
    * pttest makes this command show p-values instead of difference-in-mean 
    * between the groups in the column for t-tests.
    * stdev displays standard deviations in parenthesis instead of standard errors
    
    gen grp = mod(_n, 3)
    tab1 grp
    
    iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow
    iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1)
    * control(groupcode)  One group is tested against all other groups in t-tests and F-tests. 
    * Default is all groups against each other.
    iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1) ftest
    * I do not know what ftest mean.
    iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1) feqt pf
    * using feqt and pf options, I get p-values of f-test. only using feqt option, I get F-measures.
    
    /*a provoking example*/
    global project_folder "C:\Users\project\baseline\results"
    iebaltab outcome_variable, grpvar(treatment_variable) save("$project_folder\balancetable.xlsx")
    
    

    EXAMPLES WITH TABLEONE IN R

    data <- row_data
    data$age %>% hist
    
    data$for_duration %>% hist
    data$for_income %>% hist
    
    data$sbp %>% hist
    data$dbp %>% hist
    
    data %>% summarise(age_mean = mean(age),
                       age_sd = sd(age),
                       income_median = median(for_income),
                       income_iqr = IQR(for_income),
                       duration_median = median(for_duration),
                       duration_iqr = IQR(for_duration),
                       sbp_mean = mean(sbp),
                       sbp_sd = sd(sbp),
                       dbp_mean = mean(dbp),
                       dbp_sd = sd(dbp)) %>% 
        gather()
    
    cat_des <- function(df, chr) {
        out <- vector("list", length = 2)
        out[[1]] <- table(df[[chr]])
        out[[2]] <- prop.table(table(df[[chr]]))
        out
    }
    
    data %>% 
        discard(is.numeric) %>%
        names() %>%
        map(~cat_des(data, .)) %>% 
        map(2)
    cat_des(data, "bloodlevel")
    cat_des(data, "adherence")
    
    
    # # -----------------------------------------------------------------------
    
    data %>% filter(adherence == 'nonad') %>% 
         summarise(age_mean = mean(age),
                      age_sd = sd(age),
                      income_median = median(for_income),
                      income_iqr = IQR(for_income),
                      duration_median = median(for_duration),
                      duration_iqr = IQR(for_duration),
                      sbp_mean = mean(sbp),
                      sbp_sd = sd(sbp),
                      dbp_mean = mean(dbp),
                      dbp_sd = sd(dbp)) %>% 
        gather()
    
    data %>%  
        discard(is.numeric) %>%
        names() %>%
        map(~cat_des(filter(data, adherence == "nonad"), .)) %>% 
        map(2)
    cat_des(filter(data, adherence == "nonad"), "bloodlevel")
    
    # VERY IMPORTANT 02May2019 ------------------------------------------------
    library(tableone)
    vars <- setdiff(names(data), "adherence")
    vars # change order of charistics
    vars <- c("age", "gender", "education", "urbanity", "for_income", 
              "t", "for_duration", "cliniccheck", "sbp", "dbp", "diabete")
    tableone <- CreateTableOne(data = data, vars = vars, strata = "adherence")
    print(tableone, nonnormal = c("for_income", "for_duration"),
          explain = T, showAllLevels = T, catDigits = 2, quote = T)
    
    vars <- c("age", "gender", "education", "urbanity", "for_income", 
              "t", "for_duration", "cliniccheck", "sbp", "dbp", "diabete")
    CreateTableOne(data = data, vars = vars) %>% print(
        nonnormal = c("for_income", "for_duration"),
        showAllLevels = T, catDigits = 2, quote = T)
    
    

    相关文章

      网友评论

          本文标题:制作 table 1 的经验 (by stata and R)_

          本文链接:https://www.haomeiwen.com/subject/rdglgctx.html