本例是一个非常简单的数据集,用来预测奶牛最后一年12个月的牛奶产量。训练集是前13年每个月的牛奶产量。
本例旨在提供一种构建RNN(LSTM)网络的训练集和测试集的方法。
数据集下载地址为,
https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/13_use_case_implementation_of_rnn/monthly-milk-production-pounds-p.csv?ref_type=heads
可以直接在该页面下载,不用去到处寻找了,我也是自己谷歌找到的。
完整的Jupyter notebook地址如下, https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/13_use_case_implementation_of_rnn/13.%20Use%20Case%20Implementation%20of%20RNN.ipynb?ref_type=heads
代码如下,
# 其实是用前12个月的数据,预测后12个月的数据,中间的11个月是重合的,
# 所以只有最后一个月的数据是有用的,是预测出来的。
# 这个预测出来的数据,又被feedback到原来的数据里面去,接着做预测
# tf v1.0玩法
# 1. 根据时序数据预测一头牛每个月产多少牛奶
# 引入库和读入数据
# index_col = 'Month'
# Month列做index
# import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# Read the dataset and print the head of it
milk = pd.read_csv('./monthly-milk-production-pounds-p.csv', index_col='Month')
milk.head()
# 可视化数据
# 3. Convert the index to time series
milk.index = pd.to_datetime(milk.index)
# 4. Plot the time series data
milk.plot()
# train_test_split
# 用前12年的数据作为训练集
# 后1年的数据作为测试集
# 后一年的数据 是12个月,是由循环神经网络1个月1个月的预测出来的
# 5. Perform the train test split on the data
milk.info()
# We take the 13 years data for training
train_set = milk.head(156)
# remaining 1 year data for testing
test_set = milk.tail(12)
# 标准化数据
# 6. Scale the data using standard machine learning process
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train_set)
test_scaled = scaler.transform(test_set)
# 7. Define your customized data generator
def next_batch(training_data, batch_size, steps):
while True:
# Grab a random starting point for each batch
rand_start = np.random.randint(0, len(training_data) - steps)
# Create Y data for time series in the batches
y_batch = np.array(training_data[rand_start: rand_start + steps + 1]).reshape(1, steps + 1)
# 分别取 前steps 个 和 后 steps个,其中有 steps -1个重合的
yield y_batch[:,:-1].reshape(-1, steps, 1), y_batch[:, 1:].reshape(-1, steps, 1)
# 8. Setting up the RNN model
# import tensorflow
import tensorflow as tf
num_inputs = 1
# Num of timesteps in each batch
num_time_steps = 12
# 100 neuron layer, play with this
num_neurons = 100
# Just one output, predicted time series
num_outputs = 1
# You can also try increasing iterations, but decreasing learning rate
# learning_rate you can play with this
learning_rate = 0.03
# how many iterations to go through(training steps), you can play with this
num_train_iterations = 4000
# size of the batch of data
batch_size = 1
# Define your RNN model
class MyRNN(tf.keras.Model):
def __init__(self, hidden_size, num_outputs):
super(MyRNN, self).__init__()
self.rnn_cell = tf.keras.layers.GRU(units=hidden_size, return_sequences=True)
self.projection_layer = tf.keras.layers.Dense(units=num_outputs)
def call(self, inputs):
rnn_output = self.rnn_cell(inputs)
output = self.projection_layer(rnn_output)
return output
model = MyRNN(num_neurons, num_outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(next_batch(train_scaled, batch_size, num_time_steps), steps_per_epoch=num_train_iterations)
train_seed = list(train_scaled[-12:])
# 每次用上次产生的1 + 前面的11 接着往后面预测
for iteration in range(12):
x_batch = np.array(train_seed[-num_time_steps:]).reshape(1, num_time_steps, 1)
y_pred = model.predict(x_batch)
# 预测出来的最后一个值
print(y_pred[0,-1, 0])
# 放到train_seed的最后,参与下一次的预测工作
train_seed.append(y_pred[0,-1, 0])
train_seed[-12:], train_seed[12:]
# 17. Reshape the results
results = scaler.inverse_transform(np.array(train_seed[12:]).reshape(12, 1))
test_set['Generated'] = results
# 查看最终的test_set DataFrame
test_set
# Plot the predicted result and actual result
test_set.plot()
最终预测的效果如下,趋势还是对的。

网友评论