美文网首页
R-建模及预测

R-建模及预测

作者: 尘世中一个迷途小书僮 | 来源:发表于2022-07-12 22:00 被阅读0次

    使用R建模并预测

    library(tidyverse)
    
    ## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
    
    ## v ggplot2 3.3.5     v purrr   0.3.4
    ## v tibble  3.0.4     v dplyr   1.0.2
    ## v tidyr   1.1.2     v stringr 1.4.0
    ## v readr   1.4.0     v forcats 0.5.0
    
    ## Warning: package 'ggplot2' was built under R version 4.0.5
    
    ## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
    ## x dplyr::filter() masks stats::filter()
    ## x dplyr::lag()    masks stats::lag()
    
    data("cars")
    head(cars)
    
    ##   speed dist
    ## 1     4    2
    ## 2     4   10
    ## 3     7    4
    ## 4     7   22
    ## 5     8   16
    ## 6     9   10
    

    Linear Regression

    Model

    我们先建一个简单的线性回归模型

    model <- lm(dist~speed, cars)
    summary(model)
    
    ## 
    ## Call:
    ## lm(formula = dist ~ speed, data = cars)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -29.069  -9.525  -2.272   9.215  43.201 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
    ## speed         3.9324     0.4155   9.464 1.49e-12 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 15.38 on 48 degrees of freedom
    ## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
    ## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
    

    该模型为: dist = -17.579 + 3.932*speed.

    Prediction

    Confidence interval

    使用 predict函数根据新的数据进行预测,并给出预测值和其平均值的95%置信区间

    speeds <- data.frame(speed=c(10, 20, 53))
    predict(model, newdata = speeds, interval = "confidence")
    
    ##         fit       lwr       upr
    ## 1  21.74499  15.46192  28.02807
    ## 2  61.06908  55.24729  66.89088
    ## 3 190.83857 159.12292 222.55422
    

    Prediction interval

    给出输入的对应预测值的95%置信区间

    predict(model, newdata = speeds, interval = "prediction")
    
    ##         fit        lwr       upr
    ## 1  21.74499  -9.809601  53.29959
    ## 2  61.06908  29.603089  92.53507
    ## 3 190.83857 146.542994 235.13415
    

    可视化预测的结果

    # 1. Add predictions 
    pred.int <- predict(model, interval = "prediction")
    mydata <- cbind(cars, pred.int)
    # 2. Regression line + confidence intervals
    p <- ggplot(mydata, aes(speed, dist)) +
      geom_point() +
      stat_smooth(method = lm, formula = y~x)
    # 3. Add prediction intervals
    p + geom_line(aes(y = lwr), color = "red", linetype = "dashed")+
        geom_line(aes(y = upr), color = "red", linetype = "dashed") +
      theme_bw()
    

    其中,

    • 蓝色的是线性回归拟合曲线

    • 灰色的带为置信区间

    • 红色的虚线为预测值区间

    GLM

    在R中,广义线性回归使用 glm 函数实现

    Model

    family 参数选择拟合的回归模型

    glm.model <- glm(dist~speed, data = cars, family = gaussian)
    summary(glm.model)
    
    ## 
    ## Call:
    ## glm(formula = dist ~ speed, family = gaussian, data = cars)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -29.069   -9.525   -2.272    9.215   43.201  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
    ## speed         3.9324     0.4155   9.464 1.49e-12 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for gaussian family taken to be 236.5317)
    ## 
    ##     Null deviance: 32539  on 49  degrees of freedom
    ## Residual deviance: 11354  on 48  degrees of freedom
    ## AIC: 419.16
    ## 
    ## Number of Fisher Scoring iterations: 2
    

    该模型为: dist = -17.579 + 3.934*speed.

    与简单线性回归相差不大

    Prediction

    对于glm对象的prediction,可以设置 se.fit = TRUE来显示预测的标准误和用于计算标准误的残差

    predict(glm.model, newdata = speeds, se.fit = TRUE)
    
    ## $fit
    ##         1         2         3 
    ##  21.74499  61.06908 190.83857 
    ## 
    ## $se.fit
    ##         1         2         3 
    ##  3.124921  2.895501 15.773951 
    ## 
    ## $residual.scale
    ## [1] 15.37959
    

    LOESS regression

    还可以使用 LOESS (Local Polynomial Regression Fitting) 的方法拟合并预测

    cars %>% 
      ggplot(aes(speed, dist)) +
      geom_point() +
      geom_smooth(method = 'loess', formula = y~x, span = 1) + # span: 0.1 ~ 1
      theme_classic()
    

    Fitting only

    在默认设置下loess拟合模型只能预测处于原始数据range中的值,超出range的值无法预测

    cars.lo <- loess(dist~speed, cars)
    predict(cars.lo, speeds)
    
    ##        1        2        3 
    ## 21.86532 56.46132       NA
    

    Extrapolation

    如果想使用loess预测超出range的值,可以设置control = loess.control(surface = "direct")

    cars.lo2 <- loess(dist ~ speed, cars,
      control = loess.control(surface = "direct"))
    predict(cars.lo2, speeds, se = TRUE)
    
    ## $fit
    ##         1         2         3 
    ##  21.86532  56.44526 963.89286 
    ## 
    ## $se.fit
    ##          1          2          3 
    ##   4.119331   4.061865 467.666621 
    ## 
    ## $residual.scale
    ## [1] 15.31087
    ## 
    ## $df
    ## [1] 44.55085
    

    但这里需要考虑loess smoothing的span, 如果这个值过小,会过于拟合原始数据,导致预测准确度不高。

    以上就是对R中几种线性回归模型建模和预测方法的简述。

    完。

    ref

    https://www.journaldev.com/45290/predict-function-in-r

    http://www.sthda.com/english/articles/40-regression-analysis/166-predict-in-r-model-predictions-and-confidence-intervals/

    相关文章

      网友评论

          本文标题:R-建模及预测

          本文链接:https://www.haomeiwen.com/subject/bxzabrtx.html