美文网首页机器学习
机器学习--有监督--回归正则化(岭回归与lasso)

机器学习--有监督--回归正则化(岭回归与lasso)

作者: 小贝学生信 | 来源:发表于2021-11-07 01:12 被阅读0次

    当进行线性回归拟合,有非常多特征变量(features)时,不仅会极大增加模型复杂度,造成对于训练集的过拟合,从而降低泛化能力;此外也增加了变量间共线性的可能(multicollinearity),使模型的系数难以解释。
    regularization正则化是一种防止过拟合的方法,经常与线性回归配合使用,岭回归lasso回归便是其中两种常见的形式。

    1、回归正则化的简单理解

    • 当有非常多的特征变量时,回归模型会变得很复杂,具体变现在很多特征变量都有显著意义的系数。不仅造成模型的过拟合,而且可解释性也大打折扣。
    • 正则回归化的假设前提是:只有其中部分特征变量是对建模有突出贡献的。所以正则化回归就是尽可能凸显出部分有价值变量的地位,忽略其余的干扰变量。
    • 常见的正则化方法有:(1)Ridge,(2)Lasso (or LASSO),(3)Elastic net (or ENET)

    1.1 岭回归

    • 参数调整:λ 值越大,对系数的抑制越高(越接近0)
    • 特点:会保留所有的特征变量,即系数只会趋近于0,而不会变成0(0则意味着丢弃该变量)

    1.2 lasso回归

    • 参数调整:同样λ 值越大,对系数的抑制越高(直至为0)
    • 特点:随着λ 的增大,会删除“干扰”变量,保量有显著意义的变量,达到特征选择的目的。

    1.3 Elastic nets

    • 本质为岭回归与lasso回归的结合。
    • 参数调整:α设置岭回归与lasso回归的混合比例,λ 值同样设置参数抑制程度。

    2、R代码实操

    R包与相关函数
    • 如下所示:主要使用glmnet::glmnet(),其中alpha参数为1时(default),为lasso回归;为0时,为岭回归;为0~1中间值时,为Elastic nets
    • 而关于λ参数调整,会自动遍历100个值,从中选出最合适的。
    library(glmnet)
    glmnet(
      x = X,
      y = Y,
      alpha = 1
    )
    
    示例数据
    ames <- AmesHousing::make_ames()
    dim(ames)
    set.seed(123)
    library(rsample)
    split <- initial_split(ames, prop = 0.7, 
                           strata = "Sale_Price")
    ames_train  <- training(split)
    # [1] 2049   81
    ames_test   <- testing(split)
    # [1] 881  81
    
    # Create training feature matrices
    # we use model.matrix(...)[, -1] to discard the intercept
    X <- model.matrix(Sale_Price ~ ., ames_train)[, -1]
    # transform y with log transformation
    Y <- log(ames_train$Sale_Price)
    

    parametric models such as regularized regression are sensitive to skewed response values so transforming can often improve predictive performance.

    2.1 岭回归

    Step1:初步建模,观察不同λ 值对应的参数值
    ridge <- glmnet(x = X, y = Y,
                    alpha = 0)
    
    str(ridge$lambda)
    # num [1:100] 286 260 237 216 197 ...
    
    #lambda值越小,对参数的抑制越低
    coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 100]
    # Latitude Overall_QualVery_Excellent 
    # 0.60703722                 0.09344684
    
    #lambda值越大,对参数的抑制越高
    coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 1]
    # Latitude Overall_QualVery_Excellent 
    # 6.115930e-36               9.233251e-37
    
    plot(ridge, xvar = "lambda")
    
    Step2:10折交叉验证确认最佳的λ
    ridge <- cv.glmnet(x = X, y = Y,
                       alpha = 0)
    plot(ridge, main = "Ridge penalty\n\n")
    
    • 如上图所示绘制出不同log(λ)对应的MSE水平,其中绘制出两条具有标志意义的log(λ):左边的为MSE最小值的log(λ),右边的为据左边一个标准误距离的log(λ)。(the with an MSE within one standard error of the minimum MSE.)
    # the value with the minimum MSE
    ridge$lambda.min
    # [1] 0.1525105
    ridge$cvm[ridge$lambda == ridge$lambda.min]
    min(ridge$cvm) 
    # [1] 0.0219778
    
    # the largest value within one standard error of it
    ridge$lambda.1se
    # [1] 0.6156877
    ridge$cvm[ridge$lambda == ridge$lambda.1se]
    # [1] 0.0245219
    
    Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
    ridge <- cv.glmnet(x = X, y = Y,
                       alpha = 0)
    ridge_min <- glmnet(x = X, y = Y,
                        alpha = 0)
    
    plot(ridge_min, xvar = "lambda", main = "Ridge penalty\n\n")
    abline(v = log(ridge$lambda.min), col = "red", lty = "dashed")
    abline(v = log(ridge$lambda.1se), col = "blue", lty = "dashed")
    

    2.2 lasso回归

    Step1:初步建模,观察不同λ 值对应的参数值
    lasso <- glmnet(x = X, y = Y,
                    alpha = 1)
    
    str(lasso$lambda)
    # num [1:96] 0.286 0.26 0.237 0.216 0.197 ...
    
    #lambda值越小,对参数的抑制越低
    coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 96]
    # Latitude Overall_QualVery_Excellent 
    # 0.8126079                  0.2222406
    
    #lambda值越大,对参数的抑制越高
    coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 1]
    # Latitude Overall_QualVery_Excellent 
    # 0                          0
    
    plot(lasso, xvar = "lambda")
    
    Step2:10折交叉验证确认最佳的λ
    lasso <- cv.glmnet(x = X, y = Y,
                       alpha = 1)
    plot(lasso, main = "lasso penalty\n\n")
    
    # the value with the minimum MSE
    lasso$lambda.min
    # [1] 0.003957686
    lasso$cvm[lasso$lambda == lasso$lambda.min]
    min(lasso$cvm) 
    # [1] 0.0229088
    
    # the largest value within one standard error of it
    lasso$lambda.1se
    # [1] 0.0110125
    lasso$cvm[lasso$lambda == lasso$lambda.1se]
    # [1] 0.02566636
    
    Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
    lasso <- cv.glmnet(x = X, y = Y,
                       alpha = 1)
    lasso_min <- glmnet(x = X, y = Y,
                        alpha = 1)
    
    plot(lasso_min, xvar = "lambda", main = "lasso penalty\n\n")
    abline(v = log(lasso$lambda.min), col = "red", lty = "dashed")
    abline(v = log(lasso$lambda.1se), col = "blue", lty = "dashed")
    

    Although this lasso model does not offer significant improvement over the ridge model, we get approximately the same accuracy by using only 64 features!

    • 一个小细节:因为之前对数据处理时,对Y响应变量进行了log转换。所以如果需要和其它类型的模型比较RMSE值时,需要进行转换。
    # predict sales price on training data
    pred <- predict(lasso, X)
    # compute RMSE of transformed predicted
    RMSE(exp(pred), exp(Y))
    ## [1] 34161.13
    

    Elastic Net是通过调整α参数,使用岭回归与lasso回归的混合,进行拟合;可以通过caret包寻找最合适的比例,就不演示了~

    相关文章

      网友评论

        本文标题:机器学习--有监督--回归正则化(岭回归与lasso)

        本文链接:https://www.haomeiwen.com/subject/lodszltx.html