当进行线性回归拟合,有非常多特征变量(features)时,不仅会极大增加模型复杂度,造成对于训练集的过拟合,从而降低泛化能力;此外也增加了变量间共线性的可能(multicollinearity),使模型的系数难以解释。
regularization
正则化是一种防止过拟合的方法,经常与线性回归配合使用,岭回归
与lasso回归
便是其中两种常见的形式。
1、回归正则化的简单理解
- 当有非常多的特征变量时,回归模型会变得很复杂,具体变现在很多特征变量都有显著意义的系数。不仅造成模型的过拟合,而且可解释性也大打折扣。
- 正则回归化的假设前提是:只有其中部分特征变量是对建模有突出贡献的。所以正则化回归就是尽可能凸显出部分有价值变量的地位,忽略其余的干扰变量。
- 常见的正则化方法有:(1)
Ridge
,(2)Lasso (or LASSO)
,(3)Elastic net (or ENET)
1.1 岭回归
- 参数调整:
λ
值越大,对系数的抑制越高(越接近0) - 特点:会保留所有的特征变量,即系数只会趋近于0,而不会变成0(0则意味着丢弃该变量)
![](https://img.haomeiwen.com/i20354525/ec563d7bf0f3481d.png)
1.2 lasso回归
- 参数调整:同样
λ
值越大,对系数的抑制越高(直至为0) - 特点:随着
λ
的增大,会删除“干扰”变量,保量有显著意义的变量,达到特征选择的目的。
1.3 Elastic nets
- 本质为岭回归与lasso回归的结合。
- 参数调整:
α
设置岭回归与lasso回归的混合比例,λ
值同样设置参数抑制程度。
2、R代码实操
R包与相关函数
- 如下所示:主要使用
glmnet::glmnet()
,其中alpha
参数为1时(default),为lasso回归;为0时,为岭回归;为0~1中间值时,为Elastic nets - 而关于
λ
参数调整,会自动遍历100个值,从中选出最合适的。
library(glmnet)
glmnet(
x = X,
y = Y,
alpha = 1
)
示例数据
ames <- AmesHousing::make_ames()
dim(ames)
set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7,
strata = "Sale_Price")
ames_train <- training(split)
# [1] 2049 81
ames_test <- testing(split)
# [1] 881 81
# Create training feature matrices
# we use model.matrix(...)[, -1] to discard the intercept
X <- model.matrix(Sale_Price ~ ., ames_train)[, -1]
# transform y with log transformation
Y <- log(ames_train$Sale_Price)
parametric models such as regularized regression are sensitive to skewed response values so transforming can often improve predictive performance.
2.1 岭回归
Step1:初步建模,观察不同λ
值对应的参数值
ridge <- glmnet(x = X, y = Y,
alpha = 0)
str(ridge$lambda)
# num [1:100] 286 260 237 216 197 ...
#lambda值越小,对参数的抑制越低
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 100]
# Latitude Overall_QualVery_Excellent
# 0.60703722 0.09344684
#lambda值越大,对参数的抑制越高
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent
# 6.115930e-36 9.233251e-37
plot(ridge, xvar = "lambda")
![](https://img.haomeiwen.com/i20354525/e6e4429668e7fec9.png)
Step2:10折交叉验证确认最佳的λ
值
ridge <- cv.glmnet(x = X, y = Y,
alpha = 0)
plot(ridge, main = "Ridge penalty\n\n")
![](https://img.haomeiwen.com/i20354525/0c3d6b15e72ff17a.png)
- 如上图所示绘制出不同
log(λ)
对应的MSE水平,其中绘制出两条具有标志意义的log(λ)
:左边的为MSE最小值的log(λ)
,右边的为据左边一个标准误距离的log(λ)
。(the with an MSE within one standard error of the minimum MSE.)
# the value with the minimum MSE
ridge$lambda.min
# [1] 0.1525105
ridge$cvm[ridge$lambda == ridge$lambda.min]
min(ridge$cvm)
# [1] 0.0219778
# the largest value within one standard error of it
ridge$lambda.1se
# [1] 0.6156877
ridge$cvm[ridge$lambda == ridge$lambda.1se]
# [1] 0.0245219
Step3:最后结合交叉验证得出的最佳λ
值,可视化对应的参数值
ridge <- cv.glmnet(x = X, y = Y,
alpha = 0)
ridge_min <- glmnet(x = X, y = Y,
alpha = 0)
plot(ridge_min, xvar = "lambda", main = "Ridge penalty\n\n")
abline(v = log(ridge$lambda.min), col = "red", lty = "dashed")
abline(v = log(ridge$lambda.1se), col = "blue", lty = "dashed")
![](https://img.haomeiwen.com/i20354525/3441d58ec7fa8513.png)
2.2 lasso回归
Step1:初步建模,观察不同λ
值对应的参数值
lasso <- glmnet(x = X, y = Y,
alpha = 1)
str(lasso$lambda)
# num [1:96] 0.286 0.26 0.237 0.216 0.197 ...
#lambda值越小,对参数的抑制越低
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 96]
# Latitude Overall_QualVery_Excellent
# 0.8126079 0.2222406
#lambda值越大,对参数的抑制越高
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent
# 0 0
plot(lasso, xvar = "lambda")
![](https://img.haomeiwen.com/i20354525/faad65f2f9215fe5.png)
Step2:10折交叉验证确认最佳的λ
值
lasso <- cv.glmnet(x = X, y = Y,
alpha = 1)
plot(lasso, main = "lasso penalty\n\n")
![](https://img.haomeiwen.com/i20354525/721df9b0f4ddea37.png)
# the value with the minimum MSE
lasso$lambda.min
# [1] 0.003957686
lasso$cvm[lasso$lambda == lasso$lambda.min]
min(lasso$cvm)
# [1] 0.0229088
# the largest value within one standard error of it
lasso$lambda.1se
# [1] 0.0110125
lasso$cvm[lasso$lambda == lasso$lambda.1se]
# [1] 0.02566636
Step3:最后结合交叉验证得出的最佳λ
值,可视化对应的参数值
lasso <- cv.glmnet(x = X, y = Y,
alpha = 1)
lasso_min <- glmnet(x = X, y = Y,
alpha = 1)
plot(lasso_min, xvar = "lambda", main = "lasso penalty\n\n")
abline(v = log(lasso$lambda.min), col = "red", lty = "dashed")
abline(v = log(lasso$lambda.1se), col = "blue", lty = "dashed")
![](https://img.haomeiwen.com/i20354525/dd160ea66772724b.png)
Although this lasso model does not offer significant improvement over the ridge model, we get approximately the same accuracy by using only 64 features!
- 一个小细节:因为之前对数据处理时,对Y响应变量进行了log转换。所以如果需要和其它类型的模型比较RMSE值时,需要进行转换。
# predict sales price on training data
pred <- predict(lasso, X)
# compute RMSE of transformed predicted
RMSE(exp(pred), exp(Y))
## [1] 34161.13
Elastic Net是通过调整
α
参数,使用岭回归与lasso回归的混合,进行拟合;可以通过caret
包寻找最合适的比例,就不演示了~
网友评论