美文网首页机器学习
机器学习--有监督--回归正则化(岭回归与lasso)

机器学习--有监督--回归正则化(岭回归与lasso)

作者: 小贝学生信 | 来源:发表于2021-11-07 01:12 被阅读0次

当进行线性回归拟合,有非常多特征变量(features)时,不仅会极大增加模型复杂度,造成对于训练集的过拟合,从而降低泛化能力;此外也增加了变量间共线性的可能(multicollinearity),使模型的系数难以解释。
regularization正则化是一种防止过拟合的方法,经常与线性回归配合使用,岭回归lasso回归便是其中两种常见的形式。

1、回归正则化的简单理解

  • 当有非常多的特征变量时,回归模型会变得很复杂,具体变现在很多特征变量都有显著意义的系数。不仅造成模型的过拟合,而且可解释性也大打折扣。
  • 正则回归化的假设前提是:只有其中部分特征变量是对建模有突出贡献的。所以正则化回归就是尽可能凸显出部分有价值变量的地位,忽略其余的干扰变量。
  • 常见的正则化方法有:(1)Ridge,(2)Lasso (or LASSO),(3)Elastic net (or ENET)

1.1 岭回归

  • 参数调整:λ 值越大,对系数的抑制越高(越接近0)
  • 特点:会保留所有的特征变量,即系数只会趋近于0,而不会变成0(0则意味着丢弃该变量)

1.2 lasso回归

  • 参数调整:同样λ 值越大,对系数的抑制越高(直至为0)
  • 特点:随着λ 的增大,会删除“干扰”变量,保量有显著意义的变量,达到特征选择的目的。

1.3 Elastic nets

  • 本质为岭回归与lasso回归的结合。
  • 参数调整:α设置岭回归与lasso回归的混合比例,λ 值同样设置参数抑制程度。

2、R代码实操

R包与相关函数
  • 如下所示:主要使用glmnet::glmnet(),其中alpha参数为1时(default),为lasso回归;为0时,为岭回归;为0~1中间值时,为Elastic nets
  • 而关于λ参数调整,会自动遍历100个值,从中选出最合适的。
library(glmnet)
glmnet(
  x = X,
  y = Y,
  alpha = 1
)
示例数据
ames <- AmesHousing::make_ames()
dim(ames)
set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7, 
                       strata = "Sale_Price")
ames_train  <- training(split)
# [1] 2049   81
ames_test   <- testing(split)
# [1] 881  81

# Create training feature matrices
# we use model.matrix(...)[, -1] to discard the intercept
X <- model.matrix(Sale_Price ~ ., ames_train)[, -1]
# transform y with log transformation
Y <- log(ames_train$Sale_Price)

parametric models such as regularized regression are sensitive to skewed response values so transforming can often improve predictive performance.

2.1 岭回归

Step1:初步建模,观察不同λ 值对应的参数值
ridge <- glmnet(x = X, y = Y,
                alpha = 0)

str(ridge$lambda)
# num [1:100] 286 260 237 216 197 ...

#lambda值越小,对参数的抑制越低
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 100]
# Latitude Overall_QualVery_Excellent 
# 0.60703722                 0.09344684

#lambda值越大,对参数的抑制越高
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent 
# 6.115930e-36               9.233251e-37

plot(ridge, xvar = "lambda")
Step2:10折交叉验证确认最佳的λ
ridge <- cv.glmnet(x = X, y = Y,
                   alpha = 0)
plot(ridge, main = "Ridge penalty\n\n")
  • 如上图所示绘制出不同log(λ)对应的MSE水平,其中绘制出两条具有标志意义的log(λ):左边的为MSE最小值的log(λ),右边的为据左边一个标准误距离的log(λ)。(the with an MSE within one standard error of the minimum MSE.)
# the value with the minimum MSE
ridge$lambda.min
# [1] 0.1525105
ridge$cvm[ridge$lambda == ridge$lambda.min]
min(ridge$cvm) 
# [1] 0.0219778

# the largest value within one standard error of it
ridge$lambda.1se
# [1] 0.6156877
ridge$cvm[ridge$lambda == ridge$lambda.1se]
# [1] 0.0245219
Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
ridge <- cv.glmnet(x = X, y = Y,
                   alpha = 0)
ridge_min <- glmnet(x = X, y = Y,
                    alpha = 0)

plot(ridge_min, xvar = "lambda", main = "Ridge penalty\n\n")
abline(v = log(ridge$lambda.min), col = "red", lty = "dashed")
abline(v = log(ridge$lambda.1se), col = "blue", lty = "dashed")

2.2 lasso回归

Step1:初步建模,观察不同λ 值对应的参数值
lasso <- glmnet(x = X, y = Y,
                alpha = 1)

str(lasso$lambda)
# num [1:96] 0.286 0.26 0.237 0.216 0.197 ...

#lambda值越小,对参数的抑制越低
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 96]
# Latitude Overall_QualVery_Excellent 
# 0.8126079                  0.2222406

#lambda值越大,对参数的抑制越高
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent 
# 0                          0

plot(lasso, xvar = "lambda")
Step2:10折交叉验证确认最佳的λ
lasso <- cv.glmnet(x = X, y = Y,
                   alpha = 1)
plot(lasso, main = "lasso penalty\n\n")
# the value with the minimum MSE
lasso$lambda.min
# [1] 0.003957686
lasso$cvm[lasso$lambda == lasso$lambda.min]
min(lasso$cvm) 
# [1] 0.0229088

# the largest value within one standard error of it
lasso$lambda.1se
# [1] 0.0110125
lasso$cvm[lasso$lambda == lasso$lambda.1se]
# [1] 0.02566636
Step3:最后结合交叉验证得出的最佳λ值,可视化对应的参数值
lasso <- cv.glmnet(x = X, y = Y,
                   alpha = 1)
lasso_min <- glmnet(x = X, y = Y,
                    alpha = 1)

plot(lasso_min, xvar = "lambda", main = "lasso penalty\n\n")
abline(v = log(lasso$lambda.min), col = "red", lty = "dashed")
abline(v = log(lasso$lambda.1se), col = "blue", lty = "dashed")

Although this lasso model does not offer significant improvement over the ridge model, we get approximately the same accuracy by using only 64 features!

  • 一个小细节:因为之前对数据处理时,对Y响应变量进行了log转换。所以如果需要和其它类型的模型比较RMSE值时,需要进行转换。
# predict sales price on training data
pred <- predict(lasso, X)
# compute RMSE of transformed predicted
RMSE(exp(pred), exp(Y))
## [1] 34161.13

Elastic Net是通过调整α参数,使用岭回归与lasso回归的混合,进行拟合;可以通过caret包寻找最合适的比例,就不演示了~

相关文章

网友评论

    本文标题:机器学习--有监督--回归正则化(岭回归与lasso)

    本文链接:https://www.haomeiwen.com/subject/lodszltx.html