时间序列

作者: 馒头and花卷 | 来源:发表于2019-06-13 18:17 被阅读0次

参考博客python实现时间序列分析-ning-ML

时间序列的测试

import numpy as np
import matplotlib.pyplot as plt

output = np.array([
    97, 130, 156.5, 135.2, 137.7,
    180.5, 205.2, 190, 188.6, 196.7,
    180.3, 210.8, 196, 223, 238.2,
    263.5, 292.6, 317, 335.4, 327,
    321.9, 353.5, 397.8, 436.8, 465.7,
    476.7, 462.6, 460.8, 501.8, 501.5, 
    489.5, 542.3, 512.2, 559.8, 542.0, 
    567.0
], dtype=float)
year = np.arange(1964, 2000)

平稳性检验

时序图检验

fig, ax = plt.subplots()
ax.plot(year, output, "+-")
plt.show()

在这里插入图片描述

该序列具有明显的趋势性，所以不是通常的平稳序列

自相关图检验

def acf(data, lag=None):
    n = len(data)
    if lag is None:
        lag = int(n / 2)
    cov = []
    sample_mean = np.mean(data)
    var = np.sum((data - sample_mean) ** 2) / (n-1)
    cov.append(var)
    for i in range(1, lag):
        x = data[:n-i] - sample_mean
        y = data[i:] - sample_mean
        cov.append(np.mean(x * y))
    cov = np.array(cov, dtype=float)
    acf = cov / var
    return acf, np.arange(len(acf))

acf_inf, lags = acf(output, lag=23)
print(acf_inf, lags)
fig, ax = plt.subplots()
ax.scatter(lags, acf_inf)

[ 1.          0.91392186  0.86822948  0.81369058  0.76064724  0.68729172
  0.63440934  0.57485247  0.49429248  0.41601171  0.32584615  0.20512486
  0.08432045 -0.0501945  -0.17245647 -0.28041079 -0.37630504 -0.4783755
 -0.59591061 -0.71243415 -0.83519788 -0.97330822 -1.08197993] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]





<matplotlib.collections.PathCollection at 0x274e3b127b8>

在这里插入图片描述

比较奇怪的是，和书上的怎么不一样，而且acf绝对值不应该小于1？哪里算错了？我知道了，原来算法都是用：
$\hat{\rho}_k \approx \frac{\sum_{t=1}^{n-k}(x_t - \bar{x})(x_{t+k}-\bar{x})}{\sum_{t=1}^n (x_t - \bar{x})^2}$
算的，而不是:
$\hat{\rho}_k = \frac{\hat{\gamma}(k)}{\hat{\gamma}(0)}$

def acf(data, lag=None):
    n = len(data)
    if lag is None:
        lag = int(n / 2)
    cov = []
    sample_mean = np.mean(data)
    var = np.sum((data - sample_mean) ** 2)
    cov.append(1)
    for i in range(1, lag):
        x = data[:n-i] - sample_mean
        y = data[i:] - sample_mean
        cov.append(np.sum(x * y) / var)
    cov = np.array(cov, dtype=float)
    acf = cov
    return acf, np.arange(len(acf))

acf_inf, lags = acf(output, lag=23)
print(acf_inf, lags)
fig, ax = plt.subplots()
ax.scatter(lags, acf_inf)

[ 1.          0.91392186  0.84342292  0.76719398  0.69544891  0.6087441
  0.54377943  0.47630634  0.39543399  0.32092332  0.24205714  0.14651776
  0.05781973 -0.03298495 -0.10840121 -0.16824648 -0.21503145 -0.25968956
 -0.30646831 -0.34603944 -0.38180474 -0.41713209 -0.43279197] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]





<matplotlib.collections.PathCollection at 0x274e44b0710>

在这里插入图片描述

结论就是，自相关图显示出明显的三角对称性，这时具有单调趋势的非平稳序列的一种典型的自相关图形式.

随机性检验

跳过了

ARMA模型

AR模型

AR模型的自相关系数有俩个显著的性质: 1.拖尾性；2.指数衰减

滞后 $k$ 阶的自相关系数的通解为:
$\rho_k = \sum_{i=1}^p c_i \lambda_i^k$
其中 $|\lambda_i| < 1$ 为差分方程的特征根, $c_1, \ldots, c_p$ 为常数,且不全为0

通过这个通解形式，容易推出 $\rho_k$ 始终有非零取值，不会在 $k$ 大于某个常数之后就恒等于零，这个性质就是拖尾性.

而以指数衰减的性质就是利用自相关图判断平稳序列时所说的"短期相关"性质.

AR(p)模型的偏自相关系数具有 $p$ 阶截尾性，利用线性方程组的理论可以证明.事实上，这也是一种确定阶数的方法.另外偏自相关系数可以通过求解Yule-Walker方程获得:
$\left ( \begin{array}{cccc} 1 & \rho_1 & \cdots & \rho_{k-1} \\ \rho_1 & 1 & \cdots & \rho_{k-2} \\ \vdots & \vdots & & \vdots \\ \rho_{k-1} & \rho_{k-2} & \cdots & 1 \end{array} \right ) \left ( \begin{array}{c} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{array} \right )= \left ( \begin{array}{c} \rho_1 \\ \rho_2 \\ \vdots \\ \rho_k \end{array} \right )$

def pcf(data, lag=None):
    """计算偏自相关系数"""
    n = len(data)
    if lag == None:
        lag = int(n / 2)
    acf_inf, lags = acf(data, lag)
    M = np.zeros((lag, lag), dtype=float)
    for i in range(lag):
        for j in range(i, lag):
            index = np.abs(i - j)
            M[i, j] = acf_inf[index]
            M[j, i] = acf_inf[index]
    return np.linalg.inv(M) @ acf_inf, lags
pcf_inf, lags = pcf(output, lag=30)
print(pcf_inf)
fig, ax = plt.subplots()
ax.scatter(lags, pcf_inf)

[ 1.00000000e+00  5.32907052e-15 -1.77635684e-15  8.88178420e-16
  0.00000000e+00 -1.33226763e-15 -8.88178420e-16 -1.33226763e-15
  4.44089210e-15 -2.22044605e-15 -1.33226763e-15  2.22044605e-15
 -5.55111512e-16  7.77156117e-16  2.22044605e-16  7.77156117e-16
  8.88178420e-16 -2.22044605e-16 -1.77635684e-15  1.77635684e-15
  0.00000000e+00  2.22044605e-15 -4.44089210e-16  4.44089210e-16
 -8.88178420e-16 -1.33226763e-15  8.88178420e-16  2.66453526e-15
 -8.88178420e-16 -2.22044605e-16]





<matplotlib.collections.PathCollection at 0x274e4518be0>

在这里插入图片描述

是不是又哪里搞错了，和库里的又不一样了.

MA模型

MA(q)模型自相关系数 $q$ 阶截尾，即 $q$ 阶以后自相关系数为0

MA(q)模型偏自相关系数拖尾

ARMA模型

ARMA(p, q)模型自相关系数不截尾，而且偏自相关系数也不截尾

利用已有的库进行时间序列分析

import pandas as pd
from random import randrange
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.api import tsa
import warnings
warnings.filterwarnings("ignore")

data = pd.DataFrame(output, columns=["output"], 
                   index = pd.Index(tsa.datetools.dates_from_range("1964", "1999")))

data

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>output</th>
</tr>
</thead>
<tbody>
<tr>
<th>1964-12-31</th>
<td>97.0</td>
</tr>
<tr>
<th>1965-12-31</th>
<td>130.0</td>
</tr>
<tr>
<th>1966-12-31</th>
<td>156.5</td>
</tr>
<tr>
<th>1967-12-31</th>
<td>135.2</td>
</tr>
<tr>
<th>1968-12-31</th>
<td>137.7</td>
</tr>
<tr>
<th>1969-12-31</th>
<td>180.5</td>
</tr>
<tr>
<th>1970-12-31</th>
<td>205.2</td>
</tr>
<tr>
<th>1971-12-31</th>
<td>190.0</td>
</tr>
<tr>
<th>1972-12-31</th>
<td>188.6</td>
</tr>
<tr>
<th>1973-12-31</th>
<td>196.7</td>
</tr>
<tr>
<th>1974-12-31</th>
<td>180.3</td>
</tr>
<tr>
<th>1975-12-31</th>
<td>210.8</td>
</tr>
<tr>
<th>1976-12-31</th>
<td>196.0</td>
</tr>
<tr>
<th>1977-12-31</th>
<td>223.0</td>
</tr>
<tr>
<th>1978-12-31</th>
<td>238.2</td>
</tr>
<tr>
<th>1979-12-31</th>
<td>263.5</td>
</tr>
<tr>
<th>1980-12-31</th>
<td>292.6</td>
</tr>
<tr>
<th>1981-12-31</th>
<td>317.0</td>
</tr>
<tr>
<th>1982-12-31</th>
<td>335.4</td>
</tr>
<tr>
<th>1983-12-31</th>
<td>327.0</td>
</tr>
<tr>
<th>1984-12-31</th>
<td>321.9</td>
</tr>
<tr>
<th>1985-12-31</th>
<td>353.5</td>
</tr>
<tr>
<th>1986-12-31</th>
<td>397.8</td>
</tr>
<tr>
<th>1987-12-31</th>
<td>436.8</td>
</tr>
<tr>
<th>1988-12-31</th>
<td>465.7</td>
</tr>
<tr>
<th>1989-12-31</th>
<td>476.7</td>
</tr>
<tr>
<th>1990-12-31</th>
<td>462.6</td>
</tr>
<tr>
<th>1991-12-31</th>
<td>460.8</td>
</tr>
<tr>
<th>1992-12-31</th>
<td>501.8</td>
</tr>
<tr>
<th>1993-12-31</th>
<td>501.5</td>
</tr>
<tr>
<th>1994-12-31</th>
<td>489.5</td>
</tr>
<tr>
<th>1995-12-31</th>
<td>542.3</td>
</tr>
<tr>
<th>1996-12-31</th>
<td>512.2</td>
</tr>
<tr>
<th>1997-12-31</th>
<td>559.8</td>
</tr>
<tr>
<th>1998-12-31</th>
<td>542.0</td>
</tr>
<tr>
<th>1999-12-31</th>
<td>567.0</td>
</tr>
</tbody>
</table>
</div>

data.plot(); #时序图
plot_acf(data).show(); #自相关图
plot_pacf(data).show(); #偏自相关图

在这里插入图片描述

差分运算

data_diff = data.diff(1).dropna()
plot_acf(data_diff).show()
plot_pacf(data_diff).show()

在这里插入图片描述

就用个ARMA(1, 1, 4)吧

result = ARIMA(data, order=(0, 1, 4)).fit(disp=False)
fitvalues = result.fittedvalues
fig, ax = plt.subplots()
ax.plot(data_diff, label="known")
ax.plot(fitvalues, label="fitted")
plt.title('ARIMA RSS: %.4f' % sum(result.fittedvalues - data_diff['output']) ** 2)
ax.legend()
plt.show()

在这里插入图片描述

利用summary查看

result.summary()

<table class="simpletable">
<caption>ARIMA Model Results</caption>
<tr>
<th>Dep. Variable:</th> <td>D.output</td> <th> No. Observations: </th> <td>35</td>
</tr>
<tr>
<th>Model:</th> <td>ARIMA(0, 1, 4)</td> <th> Log Likelihood </th> <td>-156.722</td>
</tr>
<tr>
<th>Method:</th> <td>css-mle</td> <th> S.D. of innovations</th> <td>20.534</td>
</tr>
<tr>
<th>Date:</th> <td>Thu, 13 Jun 2019</td> <th> AIC </th> <td>325.444</td>
</tr>
<tr>
<th>Time:</th> <td>18:06:52</td> <th> BIC </th> <td>334.776</td>
</tr>
<tr>
<th>Sample:</th> <td>12-31-1965</td> <th> HQIC </th> <td>328.666</td>
</tr>
<tr>
<th></th> <td>- 12-31-1999</td> <th> </th> <td> </td>
</tr>
</table>
<table class="simpletable">
<tr>
<td></td> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[0.025</th> <th>0.975]</th>
</tr>
<tr>
<th>const</th> <td> 13.9682</td> <td> 0.726</td> <td> 19.227</td> <td> 0.000</td> <td> 12.544</td> <td> 15.392</td>
</tr>
<tr>
<th>ma.L1.D.output</th> <td> -0.3682</td> <td> 0.200</td> <td> -1.840</td> <td> 0.076</td> <td> -0.761</td> <td> 0.024</td>
</tr>
<tr>
<th>ma.L2.D.output</th> <td> -0.1066</td> <td> 0.182</td> <td> -0.585</td> <td> 0.563</td> <td> -0.463</td> <td> 0.250</td>
</tr>
<tr>
<th>ma.L3.D.output</th> <td> -0.3034</td> <td> 0.196</td> <td> -1.545</td> <td> 0.133</td> <td> -0.688</td> <td> 0.081</td>
</tr>
<tr>
<th>ma.L4.D.output</th> <td> -0.2218</td> <td> 0.176</td> <td> -1.262</td> <td> 0.217</td> <td> -0.566</td> <td> 0.123</td>
</tr>
</table>
<table class="simpletable">
<caption>Roots</caption>
<tr>
<td></td> <th> Real</th> <th> Imaginary</th> <th> Modulus</th> <th> Frequency</th>
</tr>
<tr>
<th>MA.1</th> <td> 1.0000</td> <td> -0.0000j</td> <td> 1.0000</td> <td> -0.0000</td>
</tr>
<tr>
<th>MA.2</th> <td> -0.1585</td> <td> -1.4742j</td> <td> 1.4827</td> <td> -0.2670</td>
</tr>
<tr>
<th>MA.3</th> <td> -0.1585</td> <td> +1.4742j</td> <td> 1.4827</td> <td> 0.2670</td>
</tr>
<tr>
<th>MA.4</th> <td> -2.0510</td> <td> -0.0000j</td> <td> 2.0510</td> <td> -0.5000</td>
</tr>
</table>

其中的ma.L1.D.output 表示模型的MA部分的第一个参数，因为我们的AR部分为0，如果存在的话也有ar.L1.D.output的

$P > |z|$ 表示t检验，这里好像检验没通过，我也不知道咋怎.

LB统计量检验

resid = result.resid
r, q, p = tsa.acf(resid.values.squeeze(), qstat=True)
print(len(r), len(q), len(p))
test_data = np.c_[range(1, len(r)), r[1:], q, p]
table = pd.DataFrame(test_data, columns=['lag', 'AC', 'Q', 'Prob(>Q)'])
print(table.set_index('lag'))

35 34 34
            AC          Q  Prob(>Q)
lag                                
1.0  -0.006969   0.001850  0.965696
2.0  -0.004150   0.002525  0.998738
3.0   0.062144   0.158812  0.983947
4.0   0.143394   1.017765  0.907090
5.0   0.030833   1.058801  0.957685
6.0  -0.012886   1.066216  0.982977
7.0   0.082033   1.377449  0.986252
8.0  -0.040114   1.454627  0.993436
9.0  -0.058027   1.622335  0.996133
10.0 -0.053791   1.772216  0.997807
11.0 -0.124073   2.602859  0.995003
12.0 -0.086263   3.021837  0.995388
13.0 -0.119229   3.858617  0.992604
14.0 -0.190790   6.103343  0.963819
15.0 -0.116808   6.986800  0.958015
16.0 -0.015661   7.003517  0.973193
17.0  0.023369   7.042806  0.982988
18.0 -0.073408   7.453300  0.985714
19.0 -0.081616   7.992434  0.986747
20.0  0.089290   8.680747  0.986319
21.0 -0.209498  12.740500  0.917441
22.0  0.180079  15.970870  0.817329
23.0  0.096448  16.974725  0.810490
24.0 -0.039265  17.156229  0.841923
25.0 -0.083454  18.058138  0.839913
26.0  0.093348  19.311970  0.822992
27.0  0.067374  20.046753  0.828788
28.0 -0.078219  21.178618  0.817823
29.0  0.006681  21.188253  0.852199
30.0  0.013390  21.234691  0.880406
31.0  0.025666  21.447964  0.899588
32.0 -0.009549  21.487322  0.920437
33.0 -0.019575  21.735429  0.933336
34.0  0.009294  21.847286  0.946828

$Prob > (Q)$ 大于0.05，所以模型是显著的

模型预测

pred = result.predict('1999', '2010', typ='levels')
print(pred)
fig, ax = plt.subplots()
ax.plot(data, label="known")
ax.plot(pred, label="pred")
ax.legend()
plt.show()

1999-12-31    561.711101
2000-12-31    581.618193
2001-12-31    597.624198
2002-12-31    615.014458
2003-12-31    627.809643
2004-12-31    641.777833
2005-12-31    655.746023
2006-12-31    669.714214
2007-12-31    683.682404
2008-12-31    697.650595
2009-12-31    711.618785
2010-12-31    725.586976
Freq: A-DEC, dtype: float64

在这里插入图片描述

时间序列

时间序列的测试

平稳性检验

时序图检验

自相关图检验

随机性检验

ARMA模型

AR模型

MA模型

ARMA模型

利用已有的库进行时间序列分析

LB统计量检验

模型预测

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读