统计描述的一般原则

- 模拟数据
set.seed(12)
gender <- factor(rbinom(100,1,0.4),levels = c(0,1),labels = c("male","female"))
trt <- factor(rbinom(100,1,0.5),levels = c(0,1),labels = c("treat","control"))
diagnosis <- factor(rbinom(100,3,0.5),levels = c(0,1,2,3),
labels = c("heat failure","renal dysfunction","ards","trauma"))
age <- rnorm(100,mean = 67,sd = 20)
wbc <- round(exp(rnorm(100,mean = 9,sd = 0.8))) # 设定其为非正态分布
data <- data.frame(gender,age,trt,diagnosis,wbc)
head(data)

- 正态分布检验
# 初步画图观察
par(mfrow = c(1,2))
hist(data$age)
hist(data$wbc)

2.1 对偏度skewness进行检验
skewness的p值大于0.05,则认为不存在偏度
agostino.test(data$age)
# D'Agostino skewness test
#
# data: data$age
# skew = -0.15621, z = -0.67550, p-value = 0.4994
# alternative hypothesis: data have a skewness
agostino.test(data$wbc)
# D'Agostino skewness test
#
# data: data$wbc
# skew = 2.2275, z = 6.4264, p-value = 1.307e-10
# alternative hypothesis: data have a skewness
2.2 对峰度kurtosis进行检验
kurtosis 等于 3 则认为是正态分布
anscombe.test(data$age)
#
# Anscombe-Glynn kurtosis test
#
# data: data$age
# kurt = 2.79983, z = -0.16094, p-value = 0.8721
# alternative hypothesis: kurtosis is not equal to 3
#
anscombe.test(data$wbc)
#
# Anscombe-Glynn kurtosis test
#
# data: data$wbc
# kurt = 9.7500, z = 4.8365, p-value = 1.321e-06
# alternative hypothesis: kurtosis is not equal to 3
正态分布需要同时满足skewness和kurtosis两个条件,才能进行t 检验
- 统计描述
summary(age) # 注意可以直接用变量名
sd(age)
summary(wbc)
table(diagnosis)
# diagnosis
# heat failure renal dysfunction ards trauma
# 16 35 36 13
prop.table(table(diagnosis))
tapply(data$wbc,data$trt,summary)
# $treat
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1712 4516 7831 10694 12773 55958
#
# $control
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 773 5284 8018 10178 12190 32705
table(data$diagnosis,data$trt)
# treat control
# heat failure 9 7
# renal dysfunction 18 17
# ards 14 22
# trauma 5 8
prop.table(table(data$diagnosis,data$trt),margin = 2)
# treat control
# heat failure 0.1956522 0.1296296
# renal dysfunction 0.3913043 0.3148148
# ards 0.3043478 0.4074074
# trauma 0.1086957 0.1481481
- 统计推断
wilcox.test(wbc~trt,data = data)
# Wilcoxon rank sum test with continuity correction
#
# data: wbc by trt
# W = 1192, p-value = 0.7321
# alternative hypothesis: true location shift is not equal to 0
chisq.test(table(data$diagnosis,data$trt))
# Pearson's Chi-squared test
#
# data: table(data$diagnosis, data$trt)
# X-squared = 2.1222, df = 3, p-value = 0.5474
t.test(age ~ trt,data = data)
- 自动作图(略)
参考资料
文中代码及部分截图来自章仲恒教授的丁香园课程:单变量分析
网友评论