美文网首页简友广场想法心理
R语言 逻辑回归logistic regression

R语言 逻辑回归logistic regression

作者: Cache_wood | 来源:发表于2022-01-08 23:26 被阅读0次

    @[toc]

    普通OLS回归

    普通OLS回归:对回归模型中的自变量、回归系数以及残差项的取值都没有任何限制,作为自变量函数的因变量就必须能够在(-\infty,+\infty)范围内自由取值。

    如果因变量只取分类值,或者只取两类值(0、1),就会严重违反因变量为连续型变量的假设。

    设:因变量y_i只取0、1两个数值的虚拟变量,是一个两点分布变量。在给定的条件下,记概率为:
    P(y_i=1|x_i) = p_i\\ P(y_i=0|x_i) = 1-p_i = q_i\\ E(y_i|x_i) = 1\times p_i + 0\times (1-p_i) = p_i
    线性回归:
    E(y_i|x_i) = \beta_0 + \beta_1 \times x_i

    logistic回归模型

    定义Logit(p_i) = In\frac{p_i}{1-p_i}
    设:Logit(p_i) = \beta_0 + \beta_1 \times x_i + \varepsilon_i

    极大似然估计:
    b_0 \rightarrow \beta_0, b_1 \rightarrow \beta_1\\ \hat{p}_i = \frac{exp(b_0+b_1x_1)}{1+exp(b_0+b_1x_i)} \in [0,1]

    -2对数似然值 -2InL
    该报告值越小,说明似然函数值越大,从而模型拟合程度越好

    拟合优度

    R^2 (Pseudo R Square)
    与R2类似,但是小于1
    调整系数

    回归系数的显著性检验 Wald统计量

    示例代码

    data <- read.csv(file = file.choose(),header = TRUE)
    
    ##maximal model
    model01<- glm(Dative~ReciAnim+ReciAcc+ThemeAcc+ReciPron+ThemePron,data = data,family=binomial)
    summary(model01)
    
    step(model01)
    
    > summary(model01)
    
    Call:
    glm(formula = Dative ~ ReciAnim + ReciAcc + ThemeAcc + ReciPron + 
        ThemePron, family = binomial, data = data)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.1900  -0.2509  -0.1634  -0.1634   2.5217  
    
    Coefficients:
                  Estimate Std. Error z value
    (Intercept)    -1.0512     0.7692  -1.367
    ReciAniminani   1.1726     0.4411   2.659
    ReciAccunacc    2.1813     0.4529   4.817
    ThemeAccunacc  -0.8667     0.6585  -1.316
    ReciPronpron   -2.3916     0.6861  -3.486
    ThemePronpron   3.3643     0.9441   3.564
                  Pr(>|z|)    
    (Intercept)   0.171703    
    ReciAniminani 0.007848 ** 
    ReciAccunacc  1.46e-06 ***
    ThemeAccunacc 0.188122    
    ReciPronpron  0.000491 ***
    ThemePronpron 0.000366 ***
    ---
    Signif. codes:  
    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 328.32  on 299  degrees of freedom
    Residual deviance: 170.98  on 294  degrees of freedom
    AIC: 182.98
    
    Number of Fisher Scoring iterations: 6
    

    变量ThemeAccunacc没有通过检验,使用step步进算法进行排除。

    AIC:赤池信息准则,衡量统计模型拟合优良性(Goodness of fit)的一种标准。它的假设条件是模型的误差服从独立正态分布。其中:k是所拟合模型中参数的数量,L是对数似然值,n是观测值数目。

    一般情况下,AIC可以表示为AIC = 2k-2ln(L)

    > step(model01)
    Start:  AIC=182.98
    Dative ~ ReciAnim + ReciAcc + ThemeAcc + ReciPron + ThemePron
    
                Df Deviance    AIC
    - ThemeAcc   1   172.82 182.82
    <none>           170.98 182.98
    - ReciAnim   1   178.36 188.36
    - ThemePron  1   183.77 193.77
    - ReciPron   1   186.52 196.52
    - ReciAcc    1   198.01 208.01
    
    Step:  AIC=182.82
    Dative ~ ReciAnim + ReciAcc + ReciPron + ThemePron
    
                Df Deviance    AIC
    <none>           172.82 182.82
    - ReciAnim   1   180.51 188.51
    - ReciPron   1   187.79 195.79
    - ThemePron  1   198.25 206.25
    - ReciAcc    1   203.52 211.52
    
    Call:  glm(formula = Dative ~ ReciAnim + ReciAcc + ReciPron + ThemePron, 
        family = binomial, data = data)
    
    Coefficients:
      (Intercept)  ReciAniminani   ReciAccunacc  
           -1.911          1.187          2.288  
     ReciPronpron  ThemePronpron  
           -2.337          3.949  
    
    Degrees of Freedom: 299 Total (i.e. Null);  295 Residual
    Null Deviance:      328.3 
    Residual Deviance: 172.8    AIC: 182.8
    

    相关文章

      网友评论

        本文标题:R语言 逻辑回归logistic regression

        本文链接:https://www.haomeiwen.com/subject/rfhjcrtx.html