美文网首页
第八节线性回归

第八节线性回归

作者: barriers | 来源:发表于2020-11-02 23:36 被阅读0次

    1线性回归算法特点

    线性回归主要用于解决回归问题;
    思想简单,实现容易;
    许多强大的非线性模型的基础;
    结果具有很好的可解释性;
    蕴含机器学习中的很多重要思想。
    样本特征只有一个,称为简单线性回归
    通过分析问题,确定问题的损失函数或者效用函数;通过最优化损失函数或者效用函数,获得机器学习的模型;

    2最小二乘法

    求取线性回归方程式y=ax+b的过程就是求取损失值最小的过程,就是求sum(abs(yi-yi))最小的过程,由于绝对值不可求导,所以也就可以变成求差值的平方的和的方程,即sum((yi-yi)2)的最小值,因为y`i = axi + b,所以变相求sum((yi-axi-b)*2)的最小值,根据数学知识,求一个可导方程的极值就是在导数为0时就可得到极值的位置。

    2.1线性回归实现

    import numpy as np
    import matplotlib.pyplot as plt
    
    x = np.array([1,2,3,4,5])
    y=np.array([1,3,2,3,5])
    plt.scatter(x,y)
    plt.show()
    x_mean = np.mean(x)
    y_mean = np.mean(y)
    num, d  = 0, 0
    for x_i,y_i in zip(x,y):
        num += (x_i - x_mean)*(y_i - y_mean)
        d += (x_i  - x_mean)**2
    a = num / d
    b = y_mean - a*x_mean
    y_hat = a*x+b
    plt.scatter(x,y)
    plt.plot(x, y_hat, color='green')
    plt.axis([0,6,0,6])
    

    2.2线性回归封装

    class SimpleLinearRegression1:
        def __init__(self):
            self.a_ = None
            self.b_ = None
        
        def fit(self,x_train,y_train):
            assert x_train.ndim == 1, '简单线性回归只能处理一维的数据'
            assert len(x_train) == len(y_train), 'x和y的训练数据集的长度需要一样长'
            x_mean = np.mean(x_train)
            y_mean = np.mean(y_train)
            num, d  = 0, 0
            for x,y in zip(x_train,y_train):
                num += (x - x_mean)*(y - y_mean)
                d += (x  - x_mean)**2
            self.a_ = num / d
            self.b_ = y_mean - a*x_mean
            return self
        def predict(self, x_predict):
            assert x_predict.ndim == 1, '简单线性回归只能处理一维的数据'
            assert self.a_ is not None and self.b_ is not None, '预测之前必须先训练'
            return np.array([self._predict(x) for x in x_predict])
        def _predict(self, x_single):
            return self.a_*x_single + self.b_
    

    3衡量回归算法的标准

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import datasets
    boston = datasets.load_boston()
    # 获取RM特征的下标,为第六列
    boston.feature_names
    # 使用房间数量这个特征
    x=boston.data[:,5]
    y = boston.target
    x = x[y<50]
    y = y[y<50]
    plt.scatter(x,y)
    from sklearn.model_selection import train_test_split
    ??train_test_split
    x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=666)
    from sklearn import linear_model
    # sklearn中的mse
    from sklearn.metrics import mean_squared_error
    # sklearn中的mae
    from sklearn.metrics import mean_absolute_error
    # sklearn中计算r方的函数
    from sklearn.metrics import r2_score
    r2_score(y_test,y_predict)
    

    3.1

    sklearn中的线性回归封装在linear_model中的LinearRegression
    LinearRegression封装了fit,predict,get_params,score,set_params等方法,
    score计算的是r方。

    4多元线性回归

    import numpy as np
    from sklearn.metrics import r2_score
    import matplotlib.pyplot as plt
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    
    class LinearRegression:
        def __init__(self):
            self.coef_ = None
            self.interception_ = None
            self._theta = None
        def fit_normal(self,X_train,y_train):
            assert X_train.shape[0] == y_train.shape[0], '特征集和标签集行数需相同'
            X_b = np.hstack([np.ones((X_train.shape[0], 1)), X_train])
            self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
            self.interception_ = self._theta[0]
            self.coef_ = self._theta[1:]
            return self
    
        def predict(self, X_predict):
            assert self.interception_ is not None and self.coef_ is not None, '必须已经训练过了'
            assert X_predict.shape[1] == len(self.coef_), '列数需相同'
            X_b = np.hstack([np.ones((X_predict.shape[0], 1)), X_predict])
            return X_b.dot(self._theta)
        def score(self,X_test,y_test):
            y_predict = self.predict(X_test)
            return r2_score(y_test,y_predict)
    boston = datasets.load_boston()
    x = boston.data
    y=boston.target
    x = x[y<50]
    y = y[y<50]
    x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=666)
    reg = LinearRegression()
    reg.fit_normal(x_train,y_train)
    reg.score(x_test,y_test)
    

    5sk-learn中的线性回归

    from sklearn.linear_model import LinearRegression
    lin_reg = LinearRegression()
    lin_reg.fit(x_train,y_train)
    lin_reg.score(x_test,y_test)
    

    6knn中的线性回归

    from sklearn.neighbors import KNeighborsRegressor
    knn_reg = KNeighborsRegressor()
    knn_reg.fit(x_train,y_train)
    knn_reg.score(x_test,y_test)
    

    7knn网格搜索超参数

    from sklearn.model_selection import GridSearchCV
    from sklearn.neighbors import KNeighborsRegressor
    param_grid = [
        {'weights':['uniform'],'n_neighbors':[i for i in range(1,11)]},
        {'weights':['distance'],'n_neighbors':[i for i in range(1,11)],'p':[i for i in range(1, 6)]},
    ]
    knn_reg = KNeighborsRegressor()
    grid_search = GridSearchCV(knn_reg,param_grid, n_jobs=-1, verbose=1)
    grid_search.fit(x_test, y_test)
    # 看最优参数
    grid_search.best_params_
    # 使用交叉验证得到的最佳准确率
    grid_search.best_score_
    # 使用与线性回归相同评估方法获取最佳准确率
    grid_search.best_estimator_.score(x_test,y_test)
    

    相关文章

      网友评论

          本文标题:第八节线性回归

          本文链接:https://www.haomeiwen.com/subject/kwljahtx.html