在临床研究中,统计是非常重要的一关,而在R语言中,可以轻松实现大多数统计方法。因此,今天小编将开启一个全新的系列推文:利用R语言实现各类常用统计方法,希望能对大家有所帮助。由于统计涵盖范围非常广泛,若有描述不当之处恳请各位小伙伴多多指正,谢谢!
描述性统计分析
此处选用的数据集是R语言自带的mtcar数据集,每一行为一种车的类型,如马自达RX4等;每一列为车的一种参数,mpg表示每加仑油行驶英里数,cyl表示气缸数(可分为4,5,6),hp马力,wt车重,am变速箱类型(0表示自动挡,1表示手动档),以此类推具体说明如下:
[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#计算描述统计分析
#计算描述统计分析
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,但明显此处的cyl,vs,am,gear,carb均为二分类或有序多分类变量,应该计数而非当作数值型变量处理,因此代码修改如下:
mtcars$cyl<-as.factor(mtcars$cyl)
mtcars$vs<-as.factor(mtcars$vs)
mtcars$am<-as.factor(mtcars$am)
mtcars$gear<-as.factor(mtcars$gear)
mtcars$carb<-as.factor(mtcars$carb)
#计算描述统计分析
summary(mtcars)
## mpg cyl disp hp drat
## Min. :10.40 4:11 Min. : 71.1 Min. : 52.0 Min. :2.760
## 1st Qu.:15.43 6: 7 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
## Median :19.20 8:14 Median :196.3 Median :123.0 Median :3.695
## Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.597
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :33.90 Max. :472.0 Max. :335.0 Max. :4.930
## wt qsec vs am gear carb
## Min. :1.513 Min. :14.50 0:18 0:19 3:15 1: 7
## 1st Qu.:2.581 1st Qu.:16.89 1:14 1:13 4:12 2:10
## Median :3.325 Median :17.71 5: 5 3: 3
## Mean :3.217 Mean :17.85 4:10
## 3rd Qu.:3.610 3rd Qu.:18.90 6: 1
## Max. :5.424 Max. :22.90 8: 1
可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,而因子型变量则会给出计数结果
也可以使用以下代码得到更多结果
# 计算均值、中位数、方差、标准差、变异系数、标准误
mystats<-function(x,na.omit=FALSE){
if(na.omit)
x<-x[!is.na(x)]
mean<-mean(x)
median<-mean(x)
var<-var(x)
sd<-sd(x)
cv=100*sd(x)/mean(x)
se=sd(x)/sqrt(length(x))
return(c(mean=mean,median=median,var=var,sd=sd,cv=cv,se=se))
}
sapply(mtcars[,c("mpg","disp","hp")], mystats)
## mpg disp hp
## mean 20.090625 230.72188 146.68750
## median 20.090625 230.72188 146.68750
## var 36.324103 15360.79983 4700.86694
## sd 6.026948 123.93869 68.56287
## cv 29.998808 53.71779 46.74077
## se 1.065424 21.90947 12.12032
四格表资料统计检验
注意以下要求 (1)所有的理论数T≥5并且总样本量n≥40,用Pearson卡方进行检验。 (2)如果理论数T<5但T≥1,并且1≥40,用连续性校正的卡方进行检验。 (3)如果有理论数T<1或n<40,则用Fisher’s检验。
tabl<-table(mtcars$vs,mtcars$am)
tabl
##
## 0 1
## 0 12 6
## 1 7 7
#计算理论频数
T = function(x){
A = matrix(0,nrow(x),ncol(x))
for(i in 1:nrow(x)){
for(j in 1:ncol(x)) A[i,j] = sum(x[i,])*sum(x[,j])/sum(x)
}
A
}
T(tabl)
## [,1] [,2]
## [1,] 10.6875 7.3125
## [2,] 8.3125 5.6875
#卡方检验
chisq.test(tabl, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: tabl
## X-squared = 0.90688, df = 1, p-value = 0.3409
#连续矫正卡方检验
chisq.test(tabl, correct = TRUE)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tabl
## X-squared = 0.34754, df = 1, p-value = 0.5555
#Fisher’s检验
fisher.test(tabl)
##
## Fisher's Exact Test for Count Data
##
## data: tabl
## p-value = 0.4727
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3825342 10.5916087
## sample estimates:
## odds ratio
## 1.956055
相关性分析
#皮尔逊相关系数
cor(mtcars$mpg,mtcars$wt,method = "pearson")
## [1] -0.8676594
#斯皮尔曼相关系数
cor(mtcars$mpg,mtcars$wt,method = "spearman")
## [1] -0.886422
#kendall相关系数
cor(mtcars$mpg,mtcars$wt,method = "kendall")
## [1] -0.7278321
#相关系数的统计检验
#皮尔逊相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "pearson")
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$wt
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9338264 -0.7440872
## sample estimates:
## cor
## -0.8676594
#斯皮尔曼相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "spearman")
## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "spearman"): Cannot
## compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: mtcars$mpg and mtcars$wt
## S = 10292, p-value = 1.488e-11
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.886422
#kendall相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "kendall")
## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "kendall"): Cannot
## compute exact p-value with ties
##
## Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$wt
## z = -5.7981, p-value = 6.706e-09
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.7278321
好了,今天的R语言实现统计方法系列推文暂时告一段落,我们下次再见吧!
网友评论