基于sklearn的几种回归模型

作者: 月见樽 | 来源:发表于2017-12-03 16:59 被阅读0次

    理论

    支持向量机回归器

    支持向量机回归器与分类器相似,关键在于从大量样本中选出对模型训练最有用的一部分向量。回归器和分类器的区别仅在于label为连续值

    K临近回归器

    K临近回归器任然是取特征向量最接近的k个训练样本,计算这几个样本的平均值获得结果(分类器是投票)

    回归树

    回归树相对于分类树的最大区别在于叶子节点的值时“连续值”,理论上来书回归树也是一种分类器,只是分的类别较多

    集成回归器

    随机森林和提升树本质上来说都是决策树的衍生,回归树也可以衍生出回归版本的随机森林和提升树。另外,随机森林还可以衍生出极端随机森林,其每个节点的特征划分并不是完全随机的

    代码实现

    数据预处理

    数据获取

    from sklearn.datasets import load_boston
    boston = load_boston()
    print(boston.DESCR)
    
    Boston House Prices dataset
    ===========================
    
    Notes
    ------
    Data Set Characteristics:  
    
        :Number of Instances: 506 
    
        :Number of Attributes: 13 numeric/categorical predictive
        
        :Median Value (attribute 14) is usually the target
    
        :Attribute Information (in order):
            - CRIM     per capita crime rate by town
            - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
            - INDUS    proportion of non-retail business acres per town
            - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
            - NOX      nitric oxides concentration (parts per 10 million)
            - RM       average number of rooms per dwelling
            - AGE      proportion of owner-occupied units built prior to 1940
            - DIS      weighted distances to five Boston employment centres
            - RAD      index of accessibility to radial highways
            - TAX      full-value property-tax rate per $10,000
            - PTRATIO  pupil-teacher ratio by town
            - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
            - LSTAT    % lower status of the population
            - MEDV     Median value of owner-occupied homes in $1000's
    
        :Missing Attribute Values: None
    
        :Creator: Harrison, D. and Rubinfeld, D.L.
    
    This is a copy of UCI ML housing dataset.
    http://archive.ics.uci.edu/ml/datasets/Housing
    
    
    This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
    
    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
    prices and the demand for clean air', J. Environ. Economics & Management,
    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
    pages 244-261 of the latter.
    
    The Boston house-price data has been used in many machine learning papers that address regression
    problems.   
         
    **References**
    
       - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
       - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
       - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)
    

    数据分割

    from sklearn.model_selection import train_test_split
    x_train,x_test,y_train,y_test = train_test_split(boston.data,boston.target,random_state=33,test_size=0.25)
    print(x_test.shape)
    
    (127, 13)    
    

    标准化

    from sklearn.preprocessing import StandardScaler
    ss_x,ss_y = StandardScaler(),StandardScaler()
    x_train = ss_x.fit_transform(x_train)
    x_test = ss_x.transform(x_test)
    y_train = ss_y.fit_transform(y_train.reshape([-1,1])).reshape(-1)
    y_test = ss_y.transform(y_test.reshape([-1,1])).reshape(-1)
    print(y_train.shape)
    
    (379,)
    

    模型训练与评估

    支持向量机回归器

    from sklearn.svm import SVR
    

    线性核函数

    l_svr = SVR(kernel='linear')
    l_svr.fit(x_train,y_train)
    l_svr.score(x_test,y_test)
    
    0.65171709742960804
    

    多项式核函数

    n_svr = SVR(kernel="poly")
    n_svr.fit(x_train,y_train)
    n_svr.score(x_test,y_test)
    
    0.40445405800289286
    

    径向基核函数

    r_svr = SVR(kernel="rbf")
    r_svr.fit(x_train,y_train)
    r_svr.score(x_test,y_test)
    
    0.75640689122739346
    

    K临近回归器

    from sklearn.neighbors import KNeighborsRegressor
    knn = KNeighborsRegressor(weights="uniform")
    knn.fit(x_train,y_train)
    knn.score(x_test,y_test)
    
    0.69034545646065615
    

    回归树

    from sklearn.tree import DecisionTreeRegressor
    dt = DecisionTreeRegressor()
    dt.fit(x_train,y_train)
    dt.score(x_test,y_test)
    
    0.68783308418825428
    

    集成模型

    随机森林

    from sklearn.ensemble import RandomForestRegressor
    rfr = RandomForestRegressor()
    rfr.fit(x_train,y_train)
    rfr.score(x_test,y_test)
    
    0.79055864833158895
    

    极端森林

    from sklearn.ensemble import ExtraTreesRegressor
    etr = ExtraTreesRegressor()
    etr.fit(x_train,y_train)
    etr.score(x_test,y_test)
    
    0.7349024110033624
    

    提升树

    from sklearn.ensemble import GradientBoostingRegressor
    gbr = GradientBoostingRegressor()
    gbr.fit(x_train,y_train)
    gbr.score(x_test,y_test)
    
    0.84501318676123161
    

    相关文章

      网友评论

        本文标题:基于sklearn的几种回归模型

        本文链接:https://www.haomeiwen.com/subject/intzbxtx.html