本文介绍实战房价预测模型,内容如下:
- 房价预测模型介绍
- 使用TensorFlow实现房价预测模型
- 使用TensorBoard可视化模型数据流图
- 实战TensorFlow房价预测
房价预测模型介绍
房价预测模型的前置知识:
监督学习:
典型的监督学习算法如下:
其中回归算法,如线性回归、逻辑回归,分类算法如决策树、随机森林等;
深度神经网络在没有足够多数据时容易欠拟合。
线性回归:
以下解释单变量线性回归和多变量线性回归:
-
单变量线性回归:
其优化函数:
其梯度沿最快的方向下降:
补步长与梯度下降的方向示意图:
-
多变量线性回归:
使用TensorFlow实现房价预测模型
单变量问题:
多变量问题:
多变量房价预测问题中的特征归一化(特征缩放):
归一化后的问题描述:
启动jupyter notebook:
pip3 install tensorboard==1.13.1 jupyter seaborn matplotlib pandas numpy
等一些列包
pip3 list installed
显示已安装的包
使用seaborn构建单变量和多变量的散点图:
- 单变量房价预测:
import pandas as pd
import seaborn as sns
sns.set(context = "notebook", style = "whitegrid", palette = "dark")
# 读取列作为指定的列
df0 = pd.read_csv('data0.csv', names = ['square', 'price'])
# height控制图标的大小
sns.lmplot('square', 'price', df0, height = 6, fit_reg = False)
显示散点图:
# 读取前五行数据
df0.head()
# 数据详情
df0.info()
- 多变量房价预测:
import matplotlib.pyplot as plt
# 使用pip3 install --upgrade matplotlib更新后引入mlp_toolkits中的mlplot3d
from mpl_toolkits import mplot3d
df1 = pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price'])
df1.head()
三列数据显示如下:
绘制3D散点图:
fig = plt.figure()
# 创建一个Axes3D对象
ax = plt.axes(projection = '3d')
ax.set_xlabel('square')
ax.set_ylabel('bedrooms')
ax.set_zlabel('price')
# 绘制3D散点图
ax.scatter3D(df1['square'], df1['bedrooms'], df1['price'], c = df1['price'], cmap = 'Greens')
3D散点图如下:
将x、y轴对换:
fig = plt.figure()
# 创建一个Axes3D对象
ax = plt.axes(projection = '3d')
ax.set_xlabel('bedrooms')
ax.set_ylabel('square')
ax.set_zlabel('price')
# 绘制3D散点图
ax.scatter3D(df1['bedrooms'], df1['square'], df1['price'], c = df1['price'], cmap = 'Greens')
对换x、y后的3D三维图绘制如下:
数据归一化:
def normal_feature(df):
return df.apply(lambda col: (col - col.mean()) / col.std())
df = normal_feature(df1)
df.head()
归一化后的数据:
使用归一化的数据绘制3D散点图:
ax = plt.axes(projection = '3d')
ax.set_xlabel('square')
ax.set_ylabel('bedrooms')
ax.set_zlabel('price')
ax.scatter3D(df1['square'], df1['bedrooms'], df1['price'], c = df1['price'], cmap = 'Reds')
散点图如下:
显示处理后的数据信息:
df.info()
数据处理: 添加ones列(即x0)
import numpy as np
ones = pd.DataFrame({'ones': np.ones(len(df))})
ones.info()
ones详情:
合并数据并显示:
# 根据列合并数据
df = pd.concat([ones, df], axis = 1)
df.head()
最终的数据显示:
df.info()
房价预测模型的创建于训练:
步骤:
- 数据预处理
- 获取数据
- 创建线性回归模型(数据流图)
- 创建会话
- 数据预处理:
import pandas as pd
import numpy as np
def normal_feature(df):
return df.apply(lambda col: (col - col.mean()) / col.std())
df = normal_feature(pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price']))
ones = pd.DataFrame({'ones': np.ones(len(df))}) # n*1
# 根据列合并
df = pd.concat([ones, df], axis = 1)
df.head()
前五行数据:
- 获取数据:
# 获取X、y数据
X_data = np.array(df[df.columns[0 : 3]]) # 0 1 2列
y_data = np.array(df[df.columns[-1]]).reshape(len(df), 1) # 最后一列reshape成n*1
print(X_data.shape, type(X_data))
print(y_data.shape, type(y_data))
数据类型:
- 创建线性回归模型(数据流图):
import tensorflow as tf
# 学习率与迭代次数
alpha = 0.01
epoch = 500
# 输入X、y, 形状分别为[47, 3]、[47, 1]
X = tf.placeholder(tf.float32, X_data.shape)
y = tf.placeholder(tf.float32, y_data.shape)
# 权重变量W 形状[3, 1], X_data.shape[1]:3
W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())
# 假设函数h(x) = w0 * x0 + w1 * x1 + w2 * x2,其中x0恒为0
# 推理值y_pred 形状[47, 1]
y_pred = tf.matmul(X, W)
# 损失函数使用最小二乘法的损失函数 转置相乘: [1, 47] * [47, 1]
loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)
# SGD优化器
opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
# 训练操作
train_op = opt.minimize(loss_op)
- 创建会话:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# 对于量很大的数据时采用批梯度下降优化
# 此处数据量小,采用全量数据方式
# 一个epoch即全量数据
# 每十个epoch输出一次
for e in range(1, epoch + 1):
sess.run(train_op, feed_dict = {X: X_data, y: y_data})
if e % 10 == 0:
loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
log_str = 'Epoch %d \t Loss=%.4g \t Model: y = %.4gx1 + %.4gx2 + %.4g'
print(log_str % (e, loss, w[1], w[2], w[0]))
部分训练该过程如下:
使用TensorBoard可视化模型数据流图
可视化模型数据流图步骤:
- 数据预处理
- 获取数据
- 创建线性回归模型(数据流图)
- 在会话中创建FileWriter实例
- 数据预处理:
import pandas as pd
import numpy as np
def normal_feature(df):
return df.apply(lambda col: (col - col.mean()) / col.std())
df = normal_feature(pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price']))
ones = pd.DataFrame({'ones': np.ones(len(df))})
df = pd.concat([ones, df], axis = 1)
df.head()
预处理后的数据:
- 获取数据:
X_data = np.array(df[df.columns[0:3]])
y_data = np.array(df[df.columns[-1]]).reshape(len(df), 1)
print(X_data.shape, type(X_data))
print(y_data.shape, type(y_data))
获取到数据的类型:
- 创建线性回归模型(数据流图):
import tensorflow as tf
alpha = 0.01
epoch = 500
X = tf.placeholder(tf.float32, X_data.shape)
y = tf.placeholder(tf.float32, y_data.shape)
W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())
y_pred = tf.matmul(X, W)
loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)
opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
train_op = opt.minimize(loss_op)
以上错误提示表明不能重复定义weights
=> 重启kernel或clear graph清除之前的数据流图
- 在会话中创建FileWriter实例:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('./summary/linear-regression-0/', sess.graph)
for e in range(1, epoch + 1):
sess.run(train_op, feed_dict = {X: X_data, y: y_data})
if e % 10 == 0:
loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
log_str = "Epoch %d \t Loss = %.4g \t Model: y = %.4gx1 + %.4gx2 + %.4g"
print(log_str % (e, loss, w[1], w[2], w[0]))
writer.close()
显示未使用名字空间和抽象节点的数据流图:
-
进入指定环境下日志所在目录,运行
tensorboard成功启动tensorboard --logdir ./ --host localhost
若有tensorboard启动失败,则尝试更新tensorboard:sudo pip install --upgrade tensorboard
-
访问浏览器:localhost:6006, 对应linear-regression-0的数据流图显示如下:
tensorboard工作日志
实战TensorFlow房价预测
以上数据流图结构较为混乱,不够清晰,创建加入抽象节点的数据流图:
import tensorflow as tf
alpha = 0.01
epoch = 500
with tf.name_scope('input'):
X = tf.placeholder(tf.float32, X_data.shape)
y = tf.placeholder(tf.float32, y_data.shape)
with tf.name_scope('hypothesis'):
W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())
y_pred = tf.matmul(X, W)
with tf.name_scope('loss'):
loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)
with tf.name_scope('train'):
opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
train_op = opt.minimize(loss_op)
创建新事件的会话:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('./summary/linear-regression-1/', sess.graph)
for e in range(1, epoch + 1):
sess.run(train_op, feed_dict = {X: X_data, y: y_data})
if e % 10 == 0:
loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
loss_data.append(loss)
log_str = "Epoch %d \t Loss = %.4g \t Model: y = %.4gx1 + %.4gx2 + %.4g"
print(log_str % (e, loss, w[1], w[2], w[0]))
writer.close()
重新访问localhost:6006:
数据流图可打开可折叠,较之前清晰。
损失值的可视化:
在创建会话时候存储loss值:
loss_data = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for e in range(1, epoch + 1):
sess.run(train_op, feed_dict = {X: X_data, y: y_data})
loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
loss_data.append(loss)
loss_data = np.array(loss_data).reshape(len(loss_data))
print(len(loss_data)) # 500
绘制loss值曲线:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(context = "notebook", style = "whitegrid", palette = "dark")
# arange支持小数步长
ax = sns.lineplot(x = 'epoch', y = 'loss', data = pd.DataFrame({'loss': loss_data, 'epoch': np.arange(epoch)}))
ax.set_xlabel('epoch')
ax.set_ylabel('loss')
plt.show()
网友评论