美文网首页
单变量分析

单变量分析

作者: 北欧森林 | 来源:发表于2021-04-29 04:03 被阅读0次

统计描述的一般原则


image.png
  1. 模拟数据
set.seed(12)
gender <- factor(rbinom(100,1,0.4),levels = c(0,1),labels = c("male","female"))
trt <- factor(rbinom(100,1,0.5),levels = c(0,1),labels = c("treat","control"))
diagnosis <- factor(rbinom(100,3,0.5),levels = c(0,1,2,3),
                    labels = c("heat failure","renal dysfunction","ards","trauma"))
age <- rnorm(100,mean = 67,sd = 20)
wbc <- round(exp(rnorm(100,mean = 9,sd = 0.8))) # 设定其为非正态分布

data <- data.frame(gender,age,trt,diagnosis,wbc)
head(data)
image.png
  1. 正态分布检验
# 初步画图观察
par(mfrow = c(1,2))
hist(data$age)
hist(data$wbc)
image.png

2.1 对偏度skewness进行检验
skewness的p值大于0.05,则认为不存在偏度

agostino.test(data$age)

# D'Agostino skewness test
# 
# data:  data$age
# skew = -0.15621, z = -0.67550, p-value = 0.4994
# alternative hypothesis: data have a skewness

agostino.test(data$wbc)

# D'Agostino skewness test
#
# data:  data$wbc
# skew = 2.2275, z = 6.4264, p-value = 1.307e-10
# alternative hypothesis: data have a skewness

2.2 对峰度kurtosis进行检验
kurtosis 等于 3 则认为是正态分布

anscombe.test(data$age)
# 
# Anscombe-Glynn kurtosis test
# 
# data:  data$age
# kurt = 2.79983, z = -0.16094, p-value = 0.8721
# alternative hypothesis: kurtosis is not equal to 3
# 
anscombe.test(data$wbc)
# 
# Anscombe-Glynn kurtosis test
# 
# data:  data$wbc
# kurt = 9.7500, z = 4.8365, p-value = 1.321e-06
# alternative hypothesis: kurtosis is not equal to 3

正态分布需要同时满足skewness和kurtosis两个条件,才能进行t 检验

  1. 统计描述
summary(age) # 注意可以直接用变量名
sd(age)
summary(wbc)
table(diagnosis)
# diagnosis
# heat failure renal dysfunction              ards            trauma 
# 16                35                36                13

prop.table(table(diagnosis))
tapply(data$wbc,data$trt,summary)

# $treat
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1712    4516    7831   10694   12773   55958 
# 
# $control
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 773    5284    8018   10178   12190   32705 
table(data$diagnosis,data$trt)

# treat control
# heat failure          9       7
# renal dysfunction    18      17
# ards                 14      22
# trauma                5       8

prop.table(table(data$diagnosis,data$trt),margin = 2)

# treat   control
# heat failure      0.1956522 0.1296296
# renal dysfunction 0.3913043 0.3148148
# ards              0.3043478 0.4074074
# trauma            0.1086957 0.1481481
  1. 统计推断
wilcox.test(wbc~trt,data = data)
 
# Wilcoxon rank sum test with continuity correction
# 
# data:  wbc by trt
# W = 1192, p-value = 0.7321
# alternative hypothesis: true location shift is not equal to 0
 
chisq.test(table(data$diagnosis,data$trt))
 
# Pearson's Chi-squared test
# 
# data:  table(data$diagnosis, data$trt)
# X-squared = 2.1222, df = 3, p-value = 0.5474

t.test(age ~ trt,data = data)
  1. 自动作图(略)

参考资料
文中代码及部分截图来自章仲恒教授的丁香园课程:单变量分析

相关文章

网友评论

      本文标题:单变量分析

      本文链接:https://www.haomeiwen.com/subject/pvuprltx.html