基于sklearn的几种回归模型

作者: 月见樽 | 来源:发表于2017-12-03 16:59 被阅读0次

理论

支持向量机回归器

支持向量机回归器与分类器相似，关键在于从大量样本中选出对模型训练最有用的一部分向量。回归器和分类器的区别仅在于label为连续值

K临近回归器

K临近回归器任然是取特征向量最接近的k个训练样本，计算这几个样本的平均值获得结果（分类器是投票）

回归树

回归树相对于分类树的最大区别在于叶子节点的值时“连续值”，理论上来书回归树也是一种分类器，只是分的类别较多

集成回归器

随机森林和提升树本质上来说都是决策树的衍生，回归树也可以衍生出回归版本的随机森林和提升树。另外，随机森林还可以衍生出极端随机森林，其每个节点的特征划分并不是完全随机的

代码实现

数据预处理

数据获取

from sklearn.datasets import load_boston
boston = load_boston()
print(boston.DESCR)

Boston House Prices dataset
===========================

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

数据分割

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(boston.data,boston.target,random_state=33,test_size=0.25)
print(x_test.shape)

(127, 13)

标准化

from sklearn.preprocessing import StandardScaler
ss_x,ss_y = StandardScaler(),StandardScaler()
x_train = ss_x.fit_transform(x_train)
x_test = ss_x.transform(x_test)
y_train = ss_y.fit_transform(y_train.reshape([-1,1])).reshape(-1)
y_test = ss_y.transform(y_test.reshape([-1,1])).reshape(-1)
print(y_train.shape)

(379,)

模型训练与评估

支持向量机回归器

from sklearn.svm import SVR

线性核函数

l_svr = SVR(kernel='linear')
l_svr.fit(x_train,y_train)
l_svr.score(x_test,y_test)

0.65171709742960804

多项式核函数

n_svr = SVR(kernel="poly")
n_svr.fit(x_train,y_train)
n_svr.score(x_test,y_test)

0.40445405800289286

径向基核函数

r_svr = SVR(kernel="rbf")
r_svr.fit(x_train,y_train)
r_svr.score(x_test,y_test)

0.75640689122739346

K临近回归器

from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(weights="uniform")
knn.fit(x_train,y_train)
knn.score(x_test,y_test)

0.69034545646065615

回归树

from sklearn.tree import DecisionTreeRegressor
dt = DecisionTreeRegressor()
dt.fit(x_train,y_train)
dt.score(x_test,y_test)

0.68783308418825428

集成模型

随机森林

from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor()
rfr.fit(x_train,y_train)
rfr.score(x_test,y_test)

0.79055864833158895

极端森林

from sklearn.ensemble import ExtraTreesRegressor
etr = ExtraTreesRegressor()
etr.fit(x_train,y_train)
etr.score(x_test,y_test)

0.7349024110033624

提升树

from sklearn.ensemble import GradientBoostingRegressor
gbr = GradientBoostingRegressor()
gbr.fit(x_train,y_train)
gbr.score(x_test,y_test)

0.84501318676123161

网友评论

本文标题：基于sklearn的几种回归模型

本文链接：https://www.haomeiwen.com/subject/intzbxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

基于sklearn的几种回归模型

理论

支持向量机回归器

K临近回归器

回归树

集成回归器

代码实现

数据预处理

数据获取

数据分割

标准化

模型训练与评估

支持向量机回归器

线性核函数

多项式核函数

径向基核函数

K临近回归器

回归树

集成模型

随机森林

极端森林

提升树

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Python语言与信息数据获取和机器学习

机器学习与计算机视觉

Sklearn

机器学习与数据挖掘