美文网首页
各类统计方法R语言实现(一)

各类统计方法R语言实现(一)

作者: 拾光_2020 | 来源:发表于2020-04-08 18:23 被阅读0次

    在临床研究中,统计是非常重要的一关,而在R语言中,可以轻松实现大多数统计方法。因此,今天小编将开启一个全新的系列推文:利用R语言实现各类常用统计方法,希望能对大家有所帮助。由于统计涵盖范围非常广泛,若有描述不当之处恳请各位小伙伴多多指正,谢谢!

    描述性统计分析

    此处选用的数据集是R语言自带的mtcar数据集,每一行为一种车的类型,如马自达RX4等;每一列为车的一种参数,mpg表示每加仑油行驶英里数,cyl表示气缸数(可分为4,5,6),hp马力,wt车重,am变速箱类型(0表示自动挡,1表示手动档),以此类推具体说明如下:

    [, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors

    data(mtcars)
    head(mtcars)
    ##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    ## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    ## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    ## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    ## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    ## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    ## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    #计算描述统计分析
    #计算描述统计分析
    summary(mtcars)
    ##       mpg             cyl             disp             hp       
    ##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
    ##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
    ##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
    ##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
    ##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
    ##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
    ##       drat             wt             qsec             vs        
    ##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
    ##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
    ##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
    ##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
    ##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
    ##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
    ##        am              gear            carb      
    ##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
    ##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
    ##  Median :0.0000   Median :4.000   Median :2.000  
    ##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
    ##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
    ##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
    

    可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,但明显此处的cyl,vs,am,gear,carb均为二分类或有序多分类变量,应该计数而非当作数值型变量处理,因此代码修改如下:

    mtcars$cyl<-as.factor(mtcars$cyl)
    mtcars$vs<-as.factor(mtcars$vs)
    mtcars$am<-as.factor(mtcars$am)
    mtcars$gear<-as.factor(mtcars$gear)
    mtcars$carb<-as.factor(mtcars$carb)
    
    #计算描述统计分析
    summary(mtcars)
    ##       mpg        cyl         disp             hp             drat      
    ##  Min.   :10.40   4:11   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
    ##  1st Qu.:15.43   6: 7   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
    ##  Median :19.20   8:14   Median :196.3   Median :123.0   Median :3.695  
    ##  Mean   :20.09          Mean   :230.7   Mean   :146.7   Mean   :3.597  
    ##  3rd Qu.:22.80          3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
    ##  Max.   :33.90          Max.   :472.0   Max.   :335.0   Max.   :4.930  
    ##        wt             qsec       vs     am     gear   carb  
    ##  Min.   :1.513   Min.   :14.50   0:18   0:19   3:15   1: 7  
    ##  1st Qu.:2.581   1st Qu.:16.89   1:14   1:13   4:12   2:10  
    ##  Median :3.325   Median :17.71                 5: 5   3: 3  
    ##  Mean   :3.217   Mean   :17.85                        4:10  
    ##  3rd Qu.:3.610   3rd Qu.:18.90                        6: 1  
    ##  Max.   :5.424   Max.   :22.90                        8: 1
    

    可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,而因子型变量则会给出计数结果

    也可以使用以下代码得到更多结果

    # 计算均值、中位数、方差、标准差、变异系数、标准误
    mystats<-function(x,na.omit=FALSE){
      if(na.omit)
          x<-x[!is.na(x)]
      mean<-mean(x)
      median<-mean(x)
      var<-var(x)
      sd<-sd(x)
      cv=100*sd(x)/mean(x)
      se=sd(x)/sqrt(length(x))
      return(c(mean=mean,median=median,var=var,sd=sd,cv=cv,se=se))
    }
    sapply(mtcars[,c("mpg","disp","hp")], mystats)
    ##              mpg        disp         hp
    ## mean   20.090625   230.72188  146.68750
    ## median 20.090625   230.72188  146.68750
    ## var    36.324103 15360.79983 4700.86694
    ## sd      6.026948   123.93869   68.56287
    ## cv     29.998808    53.71779   46.74077
    ## se      1.065424    21.90947   12.12032
    

    四格表资料统计检验

    注意以下要求 (1)所有的理论数T≥5并且总样本量n≥40,用Pearson卡方进行检验。 (2)如果理论数T<5但T≥1,并且1≥40,用连续性校正的卡方进行检验。 (3)如果有理论数T<1或n<40,则用Fisher’s检验。

    tabl<-table(mtcars$vs,mtcars$am)
    tabl
    ##    
    ##      0  1
    ##   0 12  6
    ##   1  7  7
    #计算理论频数
    
    T = function(x){
    A = matrix(0,nrow(x),ncol(x))
    for(i in 1:nrow(x)){
    for(j in 1:ncol(x)) A[i,j] = sum(x[i,])*sum(x[,j])/sum(x)
    }
    A
    }
    T(tabl)
    ##         [,1]   [,2]
    ## [1,] 10.6875 7.3125
    ## [2,]  8.3125 5.6875
    #卡方检验
    chisq.test(tabl, correct = FALSE)
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabl
    ## X-squared = 0.90688, df = 1, p-value = 0.3409
    #连续矫正卡方检验
    chisq.test(tabl, correct = TRUE)
    ## 
    ##  Pearson's Chi-squared test with Yates' continuity correction
    ## 
    ## data:  tabl
    ## X-squared = 0.34754, df = 1, p-value = 0.5555
    #Fisher’s检验
    fisher.test(tabl)
    ## 
    ##  Fisher's Exact Test for Count Data
    ## 
    ## data:  tabl
    ## p-value = 0.4727
    ## alternative hypothesis: true odds ratio is not equal to 1
    ## 95 percent confidence interval:
    ##   0.3825342 10.5916087
    ## sample estimates:
    ## odds ratio 
    ##   1.956055
    

    相关性分析

    #皮尔逊相关系数
    cor(mtcars$mpg,mtcars$wt,method = "pearson")
    ## [1] -0.8676594
    #斯皮尔曼相关系数
    cor(mtcars$mpg,mtcars$wt,method = "spearman")
    ## [1] -0.886422
    #kendall相关系数
    cor(mtcars$mpg,mtcars$wt,method = "kendall")
    ## [1] -0.7278321
    #相关系数的统计检验
    #皮尔逊相关系数
    cor.test(mtcars$mpg,mtcars$wt,method = "pearson")
    ## 
    ##  Pearson's product-moment correlation
    ## 
    ## data:  mtcars$mpg and mtcars$wt
    ## t = -9.559, df = 30, p-value = 1.294e-10
    ## alternative hypothesis: true correlation is not equal to 0
    ## 95 percent confidence interval:
    ##  -0.9338264 -0.7440872
    ## sample estimates:
    ##        cor 
    ## -0.8676594
    #斯皮尔曼相关系数
    cor.test(mtcars$mpg,mtcars$wt,method = "spearman")
    ## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "spearman"): Cannot
    ## compute exact p-value with ties
    ## 
    ##  Spearman's rank correlation rho
    ## 
    ## data:  mtcars$mpg and mtcars$wt
    ## S = 10292, p-value = 1.488e-11
    ## alternative hypothesis: true rho is not equal to 0
    ## sample estimates:
    ##       rho 
    ## -0.886422
    #kendall相关系数
    cor.test(mtcars$mpg,mtcars$wt,method = "kendall")
    ## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "kendall"): Cannot
    ## compute exact p-value with ties
    ## 
    ##  Kendall's rank correlation tau
    ## 
    ## data:  mtcars$mpg and mtcars$wt
    ## z = -5.7981, p-value = 6.706e-09
    ## alternative hypothesis: true tau is not equal to 0
    ## sample estimates:
    ##        tau 
    ## -0.7278321
    

    好了,今天的R语言实现统计方法系列推文暂时告一段落,我们下次再见吧!

    相关文章

      网友评论

          本文标题:各类统计方法R语言实现(一)

          本文链接:https://www.haomeiwen.com/subject/nnxxmhtx.html