美文网首页
Udacity- 线性回归

Udacity- 线性回归

作者: 涂大宝 | 来源:发表于2017-12-02 23:06 被阅读0次

continuous supervised learning 连续变量监督学习

regression 回归

continuous 有一定次序,且可以比较大小

1. Concept

slope:斜率

intercept:截距

coefficient:系数

2.Coding

import numpy
import matplotlib.pyplot as plt

from ages_net_worths import ageNetWorthData

ages_train, ages_test, net_worths_train, net_worths_test = ageNetWorthData()



from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(ages_train, net_worths_train)

### get Katie's net worth (she's 27)
### sklearn predictions are returned in an array, so you'll want to index into
### the output to get what you want, e.g. net_worth = predict([[27]])[0][0] (not
### exact syntax, the point is the [0] at the end). In addition, make sure the
### argument to your prediction function is in the expected format - if you get
### a warning about needing a 2d array for your data, a list of lists will be
### interpreted by sklearn as such (e.g. [[27]]).
km_net_worth = 1.0 ### fill in the line of code to get the right value
km_net_worth = reg.predict([[27]])[0][0]
### get the slope
### again, you'll get a 2-D array, so stick the [0][0] at the end
slope = 0. ### fill in the line of code to get the right value
slope = reg.coef_[0][0]
### get the intercept
### here you get a 1-D array, so stick [0] on the end to access
### the info we want
intercept = 0. ### fill in the line of code to get the right value
intercept = reg.intercept_[0]

### get the score on test data
test_score = 0. ### fill in the line of code to get the right value
test_score = reg.score(ages_test,net_worths_test)

### get the score on the training data
training_score = 0. ### fill in the line of code to get the right value
training_score = reg.score(ages_train,net_worths_train)


def submitFit():
    # all of the values in the returned dictionary are expected to be
    # numbers for the purpose of the grader.
    return {"networth":km_net_worth,
            "slope":slope,
            "intercept":intercept,
            "stats on test":test_score,
            "stats on training": training_score}

3.线性回归误差

最好的线性回归是最小化误差平方和的回归

4.最小化误差平方和的算法

ordinary least squares(OLS)
gradient descent

5.SSE的问题

sum of squared errors(SSE)

6.回归的R平方指标

0<R^2<1 越接近1,表明拟合表现的越好
优点:与训练点的数量无关,比误差平方和更可靠一点
在SKlearn中,用reg.score获取r的平方

7.分类与回归的比较

image.png

8.多变量回归

image.png

9.迷你项目

#!/usr/bin/python

"""
    Starter code for the regression mini-project.
    
    Loads up/formats a modified version of the dataset
    (why modified?  we've removed some trouble points
    that you'll find yourself in the outliers mini-project).

    Draws a little scatterplot of the training/testing data

    You fill in the regression code where indicated:
"""    


import sys
import pickle
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit
dictionary = pickle.load( open("../final_project/final_project_dataset_modified.pkl", "r") )

### list the features you want to look at--first item in the 
### list will be the "target" feature
features_list = ["bonus", "salary"]
data = featureFormat( dictionary, features_list, remove_any_zeroes=True)
target, features = targetFeatureSplit( data )

### training-testing split needed in regression, just like classification
from sklearn.cross_validation import train_test_split
feature_train, feature_test, target_train, target_test = train_test_split(features, target, test_size=0.5, random_state=42)
train_color = "b"
test_color = "b"



### Your regression goes here!
### Please name it reg, so that the plotting code below picks it up and 
### plots it correctly. Don't forget to change the test_color above from "b" to
### "r" to differentiate training points from test points.
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(feature_train, target_train)
print reg.coef_
print reg.intercept_
print reg.score(feature_train, target_train)
print reg.score(feature_test, target_test)



### draw the scatterplot, with color-coded training and testing points
import matplotlib.pyplot as plt
for feature, target in zip(feature_test, target_test):
    plt.scatter( feature, target, color=test_color ) 
for feature, target in zip(feature_train, target_train):
    plt.scatter( feature, target, color=train_color ) 

### labels for the legend
plt.scatter(feature_test[0], target_test[0], color=test_color, label="test")
plt.scatter(feature_test[0], target_test[0], color=train_color, label="train")




### draw the regression line, once it's coded
try:
    plt.plot( feature_test, reg.predict(feature_test) )
except NameError:
    pass
plt.xlabel(features_list[1])
plt.ylabel(features_list[0])
plt.legend()
plt.show()
image.png
image.png
根据LTI回归奖金

我们有许多可用的财务特征,就预测个人奖金而言,其中一些特征可能比余下的特征更为强大。例如,假设你对数据做出了思考,并且推测出“long_term_incentive”特征(为公司长期的健康发展做出贡献的雇员应该得到这份奖励)可能与奖金而非工资的关系更密切。

证明你的假设是正确的一种方式是根据长期激励回归奖金,然后看看回归是否显著高于根据工资回归奖金。根据长期奖励回归奖金—测试数据的分数是多少?

features_list = ["bonus", "long_term_incentive"]
image.png
image.png
异常值破坏回归

这是下节课的内容简介,关于异常值的识别和删除。返回至之前的一个设置,你在其中使用工资预测奖金,并且重新运行代码来回顾数据。你可能注意到,少量数据点落在了主趋势之外,即某人拿到高工资(超过 1 百万美元!)却拿到相对较少的奖金。此为异常值的一个示例,我们将在下节课中重点讲述它们。

类似的这种点可以对回归造成很大的影响:如果它落在训练集内,它可能显著影响斜率/截距。如果它落在测试集内,它可能比落在测试集外要使分数低得多。就目前情况来看,此点落在测试集内(而且最终很可能降低分数)。

现在,我们将绘制两条回归线,一条在测试数据上拟合(有异常值),一条在训练数据上拟合(无异常值)。来看看现在的图形,有很大差别,对吧?单一的异常值会引起很大的差异。

image.png

相关文章

  • Udacity- 线性回归

    continuous supervised learning 连续变量监督学习 regression 回归 con...

  • 机器学习实战——回归

    本章内容】 线性回归 局部加权线性回归 岭回归和逐步线性回归 例子 【线性回归】 wHat = (X.T*X).I...

  • 线性回归模型

    参考:1.使用Python进行线性回归2.python机器学习:多元线性回归3.线性回归概念 线性回归模型是线性模...

  • 通俗得说线性回归算法(二)线性回归实战

    前情提要:通俗得说线性回归算法(一)线性回归初步介绍 一.sklearn线性回归详解 1.1 线性回归参数 介绍完...

  • 第一次打卡

    线性回归主要内容包括: 线性回归的基本要素线性回归模型从零开始的实现线性回归模型使用pytorch的简洁实现线性回...

  • 2020-02-14

    线性回归:线性回归分为一元线性回归和多元线性回归,一元线性回归用一条直线描述数据之间的关系,多元回归是用一条曲线描...

  • 逻辑回归和线性回归对比

    简单说几点 线性回归和逻辑回归都是广义线性回归模型的特例。他们俩是兄弟关系,都是广义线性回归的亲儿子 线性回归只能...

  • 算法概述-02

    1.逻辑回归和线性回归的联系和区别: 逻辑回归和线性回归的都是广义的线性回归。 线性回归是根据最小二乘法来建模,逻...

  • 【机器学习实践】有监督学习:线性分类、回归模型

    线性模型 为线性模型 分类和回归的区别 分类:离散回归:连续本文主要关注线性回归模型 常用线性回归模型类型 OLS...

  • 统计学习基础复习浓缩版

    1.简单线性回归 2.多元线性回归 3.多项式回归 4.广义线性回归(含逻辑斯谛回归) 广义线性回归模型通过拟合响...

网友评论

      本文标题:Udacity- 线性回归

      本文链接:https://www.haomeiwen.com/subject/ugwnbxtx.html