时间序列预测的7种方法

作者: 还闹不闹 | 来源:发表于2020-06-24 12:12 被阅读0次

时间序列预测的7种方法
时间序列预测法及Spark-TimeSerial实现
11.21 interview
机器学习与时间序列预测
时间序列预测
Python fbprophet 安装方法
lstm示例
[转]Python Keras + LSTM 进行单变量时间序列
第10章关联分析和序列挖掘
ARIMA时间序列预测

参考：
http://itindex.net/detail/58931-python-%E6%97%B6%E9%97%B4%E5%BA%8F%E5%88%97-%E9%A2%84%E6%B5%8B
http://www.cppcns.com/jiaoben/python/302911.html

数据集：(18288, 3)
链接：https://pan.baidu.com/s/13Z26DFEC6uzubGDs4TkV3g
提取码：8uvk

#!/usr/bin/python
# coding=utf-8
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import *
# 画图支持中文显示
from pylab import *
mpl.rcParams['font.sans-serif'] = ['SimHei']
from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.api import SimpleExpSmoothing
import statsmodels.api as sm
from statsmodels.tsa.api import Holt
from statsmodels.tsa.api import ExponentialSmoothing

# 显示所有列
pd.set_option('display.max_columns', None)
# 显示所有行
pd.set_option('display.max_rows', None)
# 设置value的显示长度为10000，默认为50
pd.set_option('display.width',10000)
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
#
np.set_printoptions(linewidth=1000)

df = pd.read_csv('G:\\rnn\时间序列\\train.csv', header=0, sep=',')
print(df.head())
print(df.shape)
'''
依照上面的代码，我们获得了 2012-2014 年两年每个小时的乘客数量。为了解释每种方法的不同之处，以每天为单位构造和聚合了一个数据集。

从 2012 年 8 月- 2013 年 12 月的数据中构造一个数据集。
创建 train and test 文件用于建模。前 14 个月（ 2012 年 8 月- 2013 年 10 月）用作训练数据，后两个月（2013 年 11 月 – 2013 年 12 月）用作测试数据。
以每天为单位聚合数据集。
'''
# Subsetting the dataset
# Index 11856 marks the end of year 2013
df = pd.read_csv('G:\\rnn\时间序列\\train.csv', nrows=11856) # 从 2012 年 8 月- 2013 年 12 月的数据中构造一个数据集。
# Creating train and test set
# Index 10392 marks the end of October 2013
train = df[0:10392]
test = df[10392:]

# Aggregating the dataset at daily level
df['Timestamp'] = pd.to_datetime(df['Datetime'], format='%d-%m-%Y %H:%M')
df.index = df['Timestamp']
df = df.resample('D').mean()

train['Timestamp'] = pd.to_datetime(train['Datetime'], format='%d-%m-%Y %H:%M')
train.index = train['Timestamp']
train = train.resample('D').mean()

test['Timestamp'] = pd.to_datetime(test['Datetime'], format='%d-%m-%Y %H:%M')
test.index = test['Timestamp']
test = test.resample('D').mean()

#Plotting data
# figure()
# subplot(811)
train.Count.plot(figsize=(15,8), title= 'Daily Ridership', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Daily Ridership', fontsize=14)
plt.ylabel('乘客数量')
plt.show()

# 朴素法
# 如果数据集在一段时间内都很稳定，我们想预测第二天的价格，可以取前面一天的价格，预测第二天的值。这种假设第一个预测点和上一个观察点相等的预测方法就叫朴素法。
dd = np.asarray(train['Count'])
y_hat = test.copy()
y_hat['naive'] = dd[len(dd) - 1]
# subplot(812)
plt.figure(figsize=(12, 8))
plt.plot(train.index, train['Count'], label='Train')
plt.plot(test.index, test['Count'], label='Test')
plt.plot(y_hat.index, y_hat['naive'], label='Naive Forecast')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.show()

nb_rms = sqrt(mean_squared_error(test['Count'], y_hat['naive']))
print('朴素法的均方根误差：', nb_rms)

# 简单平均法
# 我们经常会遇到一些数据集，虽然在一定时期内出现小幅变动，但每个时间段的平均值确实保持不变。这种情况下，我们可以预测出第二天的价格大致和过去天数的价格平均值一致。这种将预期值等同于之前所有观测点的平均值的预测方法就叫简单平均法。
y_hat_avg = test.copy()
y_hat_avg['avg_forecast'] = train['Count'].mean()
# subplot(813)
plt.figure(figsize=(12,8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['avg_forecast'], label='Average Forecast')
plt.legend(loc='best')
plt.show()

avg_forecast_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['avg_forecast']))
print('简单平均法的均方根误差：', avg_forecast_rms)

# 移动平均法
'''
我们也经常会遇到这种数据集，比如价格或销售额某段时间大幅上升或下降。
如果我们这时用之前的简单平均法，就得使用所有先前数据的平均值，
但在这里使用之前的所有数据是说不通的，因为用开始阶段的价格值会大幅影响接下来日期的预测值。
因此，我们只取最近几个时期的价格平均值。
很明显这里的逻辑是只有最近的值最要紧。这种用某些窗口期计算平均值的预测方法就叫移动平均法。
计算移动平均值涉及到一个有时被称为“滑动窗口”的大小值p。
使用简单的移动平均模型，我们可以根据之前数值的固定有限数p的平均值预测某个时序中的下一个值。
这样，对于所有的 i > p：移动平均法实际上很有效，特别是当你为时序选择了正确的p值时。
（以下程序选择了60天作为窗口大小）
'''
y_hat_avg = test.copy()
y_hat_avg['moving_avg_forecast'] = train['Count'].rolling(60).mean().iloc[-1]
# subplot(814)
plt.figure(figsize=(16,8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['moving_avg_forecast'], label='Moving Average Forecast')
plt.legend(loc='best')
plt.show()

moving_avg_forecast_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['moving_avg_forecast']))
print('移动平均法的均方根误差：', moving_avg_forecast_rms)

# 简单指数平滑法
y_hat_avg = test.copy()
fit = SimpleExpSmoothing(np.asarray(train['Count'])).fit(smoothing_level=0.6, optimized=False)
y_hat_avg['SES'] = fit.forecast(len(test))
# subplot(815)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SES'], label='SES')
plt.legend(loc='best')
plt.show()

SimpleExpSmoothing_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['SES']))
print('简单指数平滑法的均方根误差：', SimpleExpSmoothing_rms)

# 霍尔特(Holt)线性趋势法
# subplot(8,2,11)
sm.tsa.seasonal_decompose(train['Count']).plot()
result = sm.tsa.stattools.adfuller(train['Count'])
plt.show()

y_hat_avg = test.copy()
fit = Holt(np.asarray(train['Count'])).fit(smoothing_level=0.3, smoothing_slope=0.1)
y_hat_avg['Holt_linear'] = fit.forecast(len(test))
# subplot(8,2,12)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['Holt_linear'], label='Holt_linear')
plt.legend(loc='best')
plt.show()

Holt_linear_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['Holt_linear']))
print('霍尔特(Holt)线性趋势法的均方根误差：', Holt_linear_rms)

# Holt-Winters季节性预测模型
y_hat_avg = test.copy()
fit1 = ExponentialSmoothing(np.asarray(train['Count']), seasonal_periods=7, trend='add', seasonal='add', ).fit()
y_hat_avg['Holt_Winter'] = fit1.forecast(len(test))
# subplot(817)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['Holt_Winter'], label='Holt_Winter')
plt.legend(loc='best')
plt.show()

Holt_Winter_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['Holt_Winter']))
print('Holt-Winters季节性预测模型的均方根误差：', Holt_Winter_rms)

# 自回归移动平均模型（ARIMA）
y_hat_avg = test.copy()
fit1 = sm.tsa.statespace.SARIMAX(train.Count, order=(2, 1, 4), seasonal_order=(0, 1, 1, 7)).fit()
y_hat_avg['SARIMA'] = fit1.predict(start="2013-11-1", end="2013-12-31", dynamic=True)
# subplot(818)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SARIMA'], label='SARIMA')
plt.legend(loc='best')
plt.show()

ARIMA_rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['SARIMA']))
print('自回归移动平均模型（ARIMA）的均方根误差：', ARIMA_rms)
print('============================================================================')
print('朴素法的均方根误差：', nb_rms)
print('简单平均法的均方根误差：', avg_forecast_rms)
print('移动平均法的均方根误差：', moving_avg_forecast_rms)
print('简单指数平滑法的均方根误差：', SimpleExpSmoothing_rms)
print('霍尔特(Holt)线性趋势法的均方根误差：', Holt_linear_rms)
print('Holt-Winters季节性预测模型的均方根误差：', Holt_Winter_rms)
print('自回归移动平均模型（ARIMA）的均方根误差：', ARIMA_rms)

朴素法的均方根误差： 43.91640614391676
简单平均法的均方根误差： 109.88526527082863
移动平均法的均方根误差： 46.72840725106963
简单指数平滑法的均方根误差： 43.357625225228155
霍尔特(Holt)线性趋势法的均方根误差： 43.056259611507286
Holt-Winters季节性预测模型的均方根误差： 23.961492566159794
自回归移动平均模型（ARIMA）的均方根误差： 26.052705330843708

原始数据.png

朴素法.png

简单平均法.png

移动平均法.png

简单指数平滑法.png

霍尔特(Holt)线性趋势法.png

Holt-Winters季节性预测模型.png

ARIMA.PNG

时间序列预测的7种方法
参考：http://itindex.net/detail/58931-python-%E6%97%B6%E9%97...
时间序列预测法及Spark-TimeSerial实现
时间序列预测法及Spark-Timeserial 时间序列预测法时间序列预测法(Time Series Fore...
11.21 interview
如何评价facebook开源的prophet时间序列预测工具? 时间序列分析时间序列预测之--ARIMA模型通...
机器学习与时间序列预测
前言在所有的预测问题里面，时间序列预测最让我头疼。做时间序列预测，传统模型最简便，比如Exponential ...
时间序列预测
cnn 教程： https://blog.csdn.net/weixin_39653948/article/det...
Python fbprophet 安装方法
fbprophet是facebook开源的时间序列预测框架可用于时间序列预测，支持Python语言以下命令按顺...
lstm示例
tensorflow下用LSTM网络进行时间序列预测用LSTM做时间序列预测的思路,tensorflow代码实现...
[转]Python Keras + LSTM 进行单变量时间序列
转载自Python Keras + LSTM 进行单变量时间序列预测首先，时间序列预测问题是一个复杂的预测模型问...
第10章关联分析和序列挖掘
关联分析是发现交易数据内有趣联系的一种方法，比如著名的“啤酒-尿布”。频繁序列模式挖掘，可以预测购买行为，生物序列...
ARIMA时间序列预测
Autoregressive Integrated Moving Average Model（自回归移动平均模型）...