Linear Regression
回归主要分为线性回归和逻辑回归。线性回归主要解决连续值预测问题,逻辑回归主要解决分类问题。
假定输入与输出之间有线性关系,给定样本x,假设其输出如下,还可以加上一个bias
loss function
选取的损失函数为MSE,即均方误差函数
![](https://img.haomeiwen.com/i11640553/39f0bb5cfc64bca8.png)
数据集
数据集是随便从网上搜了一个工作时间和薪水对应的csv文件,见下图
![](https://img.haomeiwen.com/i11640553/2d6df7311c8817da.png)
代码部分
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
raw = pd.read_csv("../data/Salary_Data.csv")
#X和Y是numpy.array类型
X = raw["YearsExperience"].values
Y = raw["Salary"].values
#划分训练集和测试集
x_train,x_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3)
##对x和y进行归一化
# x_train = (X_train -X_train.min())/(X_train.max()-X_train.min())
y_train = (Y_train - Y_train.min())/(Y_train.max()-Y_train.min())
# print(type(x_train))
y_test = (Y_test - Y_test.min())/(Y_test.max()-Y_test.min())
n_numbers = x_train.shape[0]
x = tf.placeholder(dtype=tf.float32,name="x")
y = tf.placeholder(dtype=tf.float32,name="y")
w = tf.get_variable("w",shape=[],initializer=tf.zeros_initializer)
b = tf.get_variable("b",shape=[],initializer=tf.zeros_initializer)
pred = tf.multiply(w,x)+b
###MSE作为损失函数
loss = tf.reduce_sum(tf.square(pred-y))/(2*n_numbers)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)
init_op =tf.initialize_all_variables()
epoches = 1000
display = 50
with tf.Session() as sess:
sess.run(init_op)
for i in range(epoches):
for (x_data,y_data) in zip(x_train,y_train):
sess.run(optimizer,feed_dict={x:x_data,y:y_data})
if (i+1) % display == 0:
print("after {} epoch of training,loss is {},w is {},b is {}".format(i,sess.run(loss,feed_dict={x:x_train,y:y_train}),sess.run(w),sess.run(b)))
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.plot(x_train, sess.run(w) * x_train + sess.run(b), label='Fitted line')
plt.legend()
plt.show()
##测试集
testing_cost = sess.run(tf.reduce_sum(tf.square(pred-y))/(2*n_numbers),feed_dict={x:x_test,y:y_test})
print("testing cost is {:.9f}".format(testing_cost))
plt.plot(x_test, y_test, 'bo', label='Testing data')
plt.plot(train_X, sess.run(w) * train_X + sess.run(b), label='Fitted line')
plt.legend()
plt.show()
结果
训练集拟合情况:
![](https://img.haomeiwen.com/i11640553/d872bca4f1c3cc88.png)
测试集拟合情况:
![](https://img.haomeiwen.com/i11640553/886faa4a60d1a343.png)
看起来好像不咋样啊。。。。但是??testing cost is 0.003018258??
涉及的东西
1.tf.enable_eager_execution()
eager execution()能够使用Python 的debug工具、数据结构与控制流。并且无需使用placeholder、session,计算结果能够立即得出。它将tensor表现得像Numpy array一样,和numpy的函数兼容
2.数据的归一化
数据归一化的方法.
- tf.Variable() vs tf.get_variable()创建变量
以下两句等价,区别在于tf.Variable的变量名是一个可选项,通过name=’v’的形式给出。但是tf.get_variable必须指定变量名。
v = tf.get_variable('v', shape=[1], initializer=tf.constant_initializer(1.0))
v = tf.Variable(tf.constant(1.0, shape=[1], name='v')
- initialize_all_variable()这个方法可以自动处理变量之间的依赖关系
TypeError: Fetch argument <function should_use_result.<locals>.wrapped at 0x00000231D706B268> has invalid type <class 'function'>, must be a string or Tensor. (Can not convert a function into a Tensor or Operation.)
原因是初始化时,tf.initialize_all_variable()忘记加后面的括号了。
5.tf.multiply()两个矩阵中对应元素各自相乘
tf.matmul()将矩阵a乘以矩阵b,生成a * b
6.并未尝试多个参数的线性回归,并未加正则项,因为不会=。=
网友评论