80. TensorFlow教程(四)房价预测

作者: 十里江城 | 来源:发表于2019-12-05 19:20 被阅读0次

80. TensorFlow教程(四)房价预测
TensorFlow实战房价预测（新）
从零开始机器学习-10 TensorFlow的基本使用方法
MAC 下安装配置 Paddlepaddle
2018-04-13
lstm示例
TensorFlow2.0教程-Keras 快速入门
TensorFlow2.0教程-keras 函数api
TensorFlow Eager 教程
TensorFlow2.0教程-使用keras训练模型

本文介绍实战房价预测模型，内容如下：

房价预测模型介绍
使用TensorFlow实现房价预测模型
使用TensorBoard可视化模型数据流图
实战TensorFlow房价预测

房价预测模型介绍

房价预测模型的前置知识：
监督学习：

典型的监督学习算法如下：

其中回归算法,如线性回归、逻辑回归，分类算法如决策树、随机森林等；
深度神经网络在没有足够多数据时容易欠拟合。

线性回归：

以下解释单变量线性回归和多变量线性回归：

单变量线性回归：

其优化函数：

其梯度沿最快的方向下降：

补步长与梯度下降的方向示意图：

多变量线性回归：

使用TensorFlow实现房价预测模型

单变量问题：

多变量问题：

多变量房价预测问题中的特征归一化(特征缩放)：

归一化后的问题描述：

启动jupyter notebook:
pip3 install tensorboard==1.13.1 jupyter seaborn matplotlib pandas numpy等一些列包
pip3 list installed显示已安装的包

使用seaborn构建单变量和多变量的散点图：

单变量房价预测：

import pandas as pd
import seaborn as sns

sns.set(context = "notebook", style = "whitegrid", palette = "dark")

# 读取列作为指定的列
df0 = pd.read_csv('data0.csv', names = ['square', 'price'])

# height控制图标的大小
sns.lmplot('square', 'price', df0, height = 6, fit_reg = False)

显示散点图：

# 读取前五行数据
df0.head()

# 数据详情
df0.info()

多变量房价预测：

import matplotlib.pyplot as plt
# 使用pip3 install --upgrade matplotlib更新后引入mlp_toolkits中的mlplot3d
from mpl_toolkits import mplot3d

df1 = pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price'])
df1.head()

三列数据显示如下：

绘制3D散点图：

fig = plt.figure()

# 创建一个Axes3D对象
ax = plt.axes(projection = '3d')
ax.set_xlabel('square')
ax.set_ylabel('bedrooms')
ax.set_zlabel('price')

# 绘制3D散点图
ax.scatter3D(df1['square'], df1['bedrooms'], df1['price'], c = df1['price'], cmap = 'Greens')

3D散点图如下：

将x、y轴对换：

 
fig = plt.figure()

# 创建一个Axes3D对象
ax = plt.axes(projection = '3d')
ax.set_xlabel('bedrooms')
ax.set_ylabel('square')
ax.set_zlabel('price')

# 绘制3D散点图
ax.scatter3D(df1['bedrooms'], df1['square'], df1['price'], c = df1['price'], cmap = 'Greens')

对换x、y后的3D三维图绘制如下：

数据归一化：

def normal_feature(df):
    return df.apply(lambda col: (col - col.mean()) / col.std())
    
df = normal_feature(df1)
df.head()

归一化后的数据：

使用归一化的数据绘制3D散点图：

ax = plt.axes(projection = '3d')
ax.set_xlabel('square')
ax.set_ylabel('bedrooms')
ax.set_zlabel('price')
ax.scatter3D(df1['square'], df1['bedrooms'], df1['price'], c = df1['price'], cmap = 'Reds')

散点图如下：

显示处理后的数据信息：

df.info()

数据处理: 添加ones列(即x0)

import numpy as np

ones = pd.DataFrame({'ones': np.ones(len(df))})
ones.info()

ones详情：

合并数据并显示：

# 根据列合并数据
df = pd.concat([ones, df], axis = 1)
df.head()

最终的数据显示：

df.info()

房价预测模型的创建于训练：
步骤：

数据预处理
获取数据
创建线性回归模型(数据流图)
创建会话

数据预处理：

import pandas as pd
import numpy as np

def normal_feature(df):
    return df.apply(lambda col: (col - col.mean()) / col.std())
    
df = normal_feature(pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price']))
ones = pd.DataFrame({'ones': np.ones(len(df))}) # n*1
# 根据列合并
df = pd.concat([ones, df], axis = 1)
df.head()

前五行数据：

获取数据：

# 获取X、y数据
X_data = np.array(df[df.columns[0 : 3]]) # 0 1 2列
y_data = np.array(df[df.columns[-1]]).reshape(len(df), 1) # 最后一列reshape成n*1

print(X_data.shape, type(X_data))
print(y_data.shape, type(y_data))

数据类型：

创建线性回归模型(数据流图):

import tensorflow as tf

# 学习率与迭代次数
alpha = 0.01 
epoch = 500 

# 输入X、y, 形状分别为[47, 3]、[47, 1]
X = tf.placeholder(tf.float32, X_data.shape)
y = tf.placeholder(tf.float32, y_data.shape)

# 权重变量W  形状[3, 1], X_data.shape[1]:3
W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())

# 假设函数h(x) = w0 * x0 + w1 * x1 + w2 * x2，其中x0恒为0
# 推理值y_pred 形状[47, 1]
y_pred = tf.matmul(X, W)

# 损失函数使用最小二乘法的损失函数 转置相乘: [1, 47] * [47, 1] 
loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)

# SGD优化器
opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
# 训练操作
train_op = opt.minimize(loss_op)

创建会话:

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    
    # 对于量很大的数据时采用批梯度下降优化
    # 此处数据量小，采用全量数据方式

    # 一个epoch即全量数据
    # 每十个epoch输出一次
    for e in range(1, epoch + 1):
        sess.run(train_op, feed_dict = {X: X_data, y: y_data})
        if e % 10 == 0:
            loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
            log_str = 'Epoch %d \t Loss=%.4g \t Model: y = %.4gx1 + %.4gx2 + %.4g'
            print(log_str % (e, loss, w[1], w[2], w[0]))

部分训练该过程如下：

使用TensorBoard可视化模型数据流图

可视化模型数据流图步骤：

数据预处理
获取数据
创建线性回归模型(数据流图)
在会话中创建FileWriter实例

数据预处理:


import pandas as pd
import numpy as np


def normal_feature(df):
    return df.apply(lambda col: (col - col.mean()) / col.std())


df = normal_feature(pd.read_csv('data1.csv', names = ['square', 'bedrooms', 'price']))
ones = pd.DataFrame({'ones': np.ones(len(df))})

df = pd.concat([ones, df], axis = 1)
df.head()

预处理后的数据：

获取数据:

X_data = np.array(df[df.columns[0:3]])
y_data = np.array(df[df.columns[-1]]).reshape(len(df), 1)

print(X_data.shape, type(X_data))
print(y_data.shape, type(y_data))

获取到数据的类型：

创建线性回归模型(数据流图):

import tensorflow as tf

alpha = 0.01
epoch = 500

X = tf.placeholder(tf.float32, X_data.shape)
y = tf.placeholder(tf.float32, y_data.shape)

W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())

y_pred = tf.matmul(X, W)

loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)
opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
train_op = opt.minimize(loss_op)

以上错误提示表明不能重复定义weights
=> 重启kernel或clear graph清除之前的数据流图

在会话中创建FileWriter实例:

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter('./summary/linear-regression-0/', sess.graph)

    for e in range(1, epoch + 1):
        sess.run(train_op, feed_dict = {X: X_data, y: y_data})
        if e % 10 == 0:
            loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
            log_str = "Epoch %d \t Loss = %.4g \t  Model: y = %.4gx1 + %.4gx2 + %.4g"
            print(log_str % (e, loss, w[1], w[2], w[0]))
        
writer.close()

显示未使用名字空间和抽象节点的数据流图：

进入指定环境下日志所在目录，运行tensorboard --logdir ./ --host localhost
若有tensorboard启动失败，则尝试更新tensorboard:sudo pip install --upgrade tensorboard

tensorboard成功启动
访问浏览器：localhost:6006, 对应linear-regression-0的数据流图显示如下：

tensorboard工作日志

实战TensorFlow房价预测

以上数据流图结构较为混乱，不够清晰，创建加入抽象节点的数据流图：

import tensorflow as tf

alpha = 0.01
epoch = 500

with tf.name_scope('input'):
    X = tf.placeholder(tf.float32, X_data.shape)
    y = tf.placeholder(tf.float32, y_data.shape)

with tf.name_scope('hypothesis'):
    W = tf.get_variable("weights", (X_data.shape[1], 1), initializer = tf.constant_initializer())
    y_pred = tf.matmul(X, W)
    
with tf.name_scope('loss'):
    loss_op = 1 / (2 * len(X_data)) * tf.matmul((y_pred - y), (y_pred - y), transpose_a = True)

    
with tf.name_scope('train'):    
    opt = tf.train.GradientDescentOptimizer(learning_rate = alpha)
    train_op = opt.minimize(loss_op)

创建新事件的会话：


with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter('./summary/linear-regression-1/', sess.graph)

    for e in range(1, epoch + 1):
        sess.run(train_op, feed_dict = {X: X_data, y: y_data})
        if e % 10 == 0:
            loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
            loss_data.append(loss)
            log_str = "Epoch %d \t Loss = %.4g \t  Model: y = %.4gx1 + %.4gx2 + %.4g"
            print(log_str % (e, loss, w[1], w[2], w[0]))
        
writer.close()

重新访问localhost:6006:

数据流图可打开可折叠，较之前清晰。

损失值的可视化：
在创建会话时候存储loss值：

loss_data = []

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    for e in range(1, epoch + 1):
        sess.run(train_op, feed_dict = {X: X_data, y: y_data})
        loss, w = sess.run([loss_op, W], feed_dict = {X: X_data, y: y_data})
        loss_data.append(loss)
        
loss_data = np.array(loss_data).reshape(len(loss_data))
print(len(loss_data))         # 500

绘制loss值曲线：

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(context = "notebook", style = "whitegrid", palette = "dark")
 
# arange支持小数步长
ax = sns.lineplot(x = 'epoch', y = 'loss', data = pd.DataFrame({'loss': loss_data, 'epoch': np.arange(epoch)}))
ax.set_xlabel('epoch')
ax.set_ylabel('loss')
plt.show()