美文网首页
Python数据分析与机器学习44-Python生成时间序列

Python数据分析与机器学习44-Python生成时间序列

作者: 只是甲 | 来源:发表于2022-08-03 10:11 被阅读0次

一. Python 生成时间序列

时间序列

  • 时间戳(timestamp)
  • 固定周期(period)
  • 时间间隔(interval)

date_range

  • 可以指定开始时间与周期
  • H:小时
  • D:天
  • M:月

二.生成不同间隔的时间序列

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 从2022-07-01开始,间隔3天,生成10条 时间数据
rng = pd.date_range('2022-07-01', periods = 10, freq = '3D')
print(rng)
print("#####################")

# 指定开始时间,结束时间  以及频率
data=pd.date_range('2022-01-01','2023-01-01',freq='M')
print(data)
print("#####################")

# 从2022-01-01开始,间隔1天,生成20条 时间数据
time=pd.Series(np.random.randn(20),
           index=pd.date_range(dt.datetime(2022,1,1),periods=20))
print(time)
print("#####################")

# 不规则的时间间隔
p1 = pd.period_range('2022-01-01 10:10', freq = '25H', periods = 10)
print(p1)
print("######################################")

# 指定索引
rng = pd.date_range('2022 Jul 1', periods = 10, freq = 'D')
print(pd.Series(range(len(rng)), index = rng))
print("######################################")

测试记录:

DatetimeIndex(['2022-07-01', '2022-07-04', '2022-07-07', '2022-07-10',
               '2022-07-13', '2022-07-16', '2022-07-19', '2022-07-22',
               '2022-07-25', '2022-07-28'],
              dtype='datetime64[ns]', freq='3D')
#####################
DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31'],
              dtype='datetime64[ns]', freq='M')
#####################
2022-01-01   -0.957412
2022-01-02   -0.333720
2022-01-03    1.079960
2022-01-04    0.050675
2022-01-05    0.270313
2022-01-06   -0.222715
2022-01-07   -0.560258
2022-01-08    1.009430
2022-01-09   -0.678157
2022-01-10    0.213557
2022-01-11   -0.720791
2022-01-12    0.332096
2022-01-13   -0.986449
2022-01-14   -0.357303
2022-01-15   -0.559618
2022-01-16    0.480281
2022-01-17   -0.443998
2022-01-18    1.541631
2022-01-19   -0.094559
2022-01-20    1.875012
Freq: D, dtype: float64
#####################
PeriodIndex(['2022-01-01 10:00', '2022-01-02 11:00', '2022-01-03 12:00',
             '2022-01-04 13:00', '2022-01-05 14:00', '2022-01-06 15:00',
             '2022-01-07 16:00', '2022-01-08 17:00', '2022-01-09 18:00',
             '2022-01-10 19:00'],
            dtype='period[25H]', freq='25H')
######################################
2022-07-01    0
2022-07-02    1
2022-07-03    2
2022-07-04    3
2022-07-05    4
2022-07-06    5
2022-07-07    6
2022-07-08    7
2022-07-09    8
2022-07-10    9
Freq: D, dtype: int64
######################################

三. 截断时间段

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 从2022-01-01开始,间隔1天,生成20条 时间数据
time=pd.Series(np.random.randn(20),
           index=pd.date_range(dt.datetime(2022,1,1),periods=20))
print(time)
print("#####################")

# 只输出2022-01-10 之后的数据
print(time.truncate(before='2022-1-10'))
print("#####################")

# 只输出2022-01-10 之后的数据
print(time.truncate(after='2022-1-10'))
print("#####################")

# 输出区间段
print(time['2022-01-15':'2022-01-20'])
print("#####################")

测试记录:

2022-01-01   -0.203552
2022-01-02   -1.035483
2022-01-03    0.252587
2022-01-04   -1.046993
2022-01-05    0.152435
2022-01-06   -0.534518
2022-01-07    0.770170
2022-01-08   -0.038129
2022-01-09    0.531485
2022-01-10    0.499937
2022-01-11    0.815295
2022-01-12    2.315740
2022-01-13   -0.443379
2022-01-14   -0.689247
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################
2022-01-10    0.499937
2022-01-11    0.815295
2022-01-12    2.315740
2022-01-13   -0.443379
2022-01-14   -0.689247
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################
2022-01-01   -0.203552
2022-01-02   -1.035483
2022-01-03    0.252587
2022-01-04   -1.046993
2022-01-05    0.152435
2022-01-06   -0.534518
2022-01-07    0.770170
2022-01-08   -0.038129
2022-01-09    0.531485
2022-01-10    0.499937
Freq: D, dtype: float64
#####################
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################

四. 时间戳及时间计算

代码:

import pandas as pd
import numpy as np
import datetime as dt

#时间戳
print(pd.Timestamp('2022-07-25'))
print(pd.Timestamp('2022-07-25 10'))
print(pd.Timestamp('2022-07-25 10:15'))
print("######################################")

#时间区间
print(pd.Period('2022-01'))
print(pd.Period('2022-01-01'))
print("######################################")

#时间计算
#help(pd.Timedelta)
print(pd.Period('2022-01-01 10:10') + pd.Timedelta('1 day'))
print(pd.Period('2022-01-01 10:10:10') + pd.Timedelta('1 s'))
print("######################################")

测试记录:

2022-07-25 00:00:00
2022-07-25 10:00:00
2022-07-25 10:15:00
######################################
2022-01
2022-01-01
######################################
2022-01-02 10:10
2022-01-01 10:10:11
######################################

五. 数据重采样

数据重采样

  • 时间数据由一个频率转换到另一个频率
  • 降采样
  • 升采样

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 生成时间序列
rng = pd.date_range('1/1/2022', periods=90, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
#print(ts.head())

# 按月进行汇总
print(ts.resample('M').sum())
print("######################################")
# 按3天进行汇总
print(ts.resample('3D').sum())
print("######################################")
#  求3天的平均值
day3Ts = ts.resample('3D').mean()
print(day3Ts)
print("######################################")
# 将3天的时间序列转为1天的,结果发现很多空值
# 插值方法:
# 1. ffill 空值取前面的值
# 2. bfill 空值取后面的值
# 3. interpolate 线性取值
print(day3Ts.resample('D').asfreq())
print("######################################")
print(day3Ts.resample('D').ffill(1))
print("######################################")
print(day3Ts.resample('D').bfill(1))
print("######################################")
print(day3Ts.resample('D').interpolate('linear'))
print("######################################")

测试记录:

2022-01-31    0.904974
2022-02-28   -1.930083
2022-03-31    7.617911
Freq: M, dtype: float64
######################################
2022-01-01    0.104413
2022-01-04    2.255400
2022-01-07   -0.993552
2022-01-10    1.234344
2022-01-13   -0.621381
2022-01-16   -0.072830
2022-01-19   -0.215890
2022-01-22    0.050444
2022-01-25   -1.794619
2022-01-28    0.030952
2022-01-31   -1.022843
2022-02-03   -1.035522
2022-02-06   -1.124857
2022-02-09    1.915781
2022-02-12    0.263875
2022-02-15    0.927552
2022-02-18    0.760483
2022-02-21   -2.771669
2022-02-24    2.157336
2022-02-27    0.107964
2022-03-02   -0.852413
2022-03-05    1.252628
2022-03-08   -0.529793
2022-03-11    2.110139
2022-03-14    1.624062
2022-03-17   -0.241604
2022-03-20   -2.165326
2022-03-23    2.975993
2022-03-26    1.389412
2022-03-29    0.874324
dtype: float64
######################################
2022-01-01    0.034804
2022-01-04    0.751800
2022-01-07   -0.331184
2022-01-10    0.411448
2022-01-13   -0.207127
2022-01-16   -0.024277
2022-01-19   -0.071963
2022-01-22    0.016815
2022-01-25   -0.598206
2022-01-28    0.010317
2022-01-31   -0.340948
2022-02-03   -0.345174
2022-02-06   -0.374952
2022-02-09    0.638594
2022-02-12    0.087958
2022-02-15    0.309184
2022-02-18    0.253494
2022-02-21   -0.923890
2022-02-24    0.719112
2022-02-27    0.035988
2022-03-02   -0.284138
2022-03-05    0.417543
2022-03-08   -0.176598
2022-03-11    0.703380
2022-03-14    0.541354
2022-03-17   -0.080535
2022-03-20   -0.721775
2022-03-23    0.991998
2022-03-26    0.463137
2022-03-29    0.291441
dtype: float64
######################################
2022-01-01    0.034804
2022-01-02         NaN
2022-01-03         NaN
2022-01-04    0.751800
2022-01-05         NaN
2022-01-06         NaN
2022-01-07   -0.331184
2022-01-08         NaN
2022-01-09         NaN
2022-01-10    0.411448
2022-01-11         NaN
2022-01-12         NaN
2022-01-13   -0.207127
2022-01-14         NaN
2022-01-15         NaN
2022-01-16   -0.024277
2022-01-17         NaN
2022-01-18         NaN
2022-01-19   -0.071963
2022-01-20         NaN
2022-01-21         NaN
2022-01-22    0.016815
2022-01-23         NaN
2022-01-24         NaN
2022-01-25   -0.598206
2022-01-26         NaN
2022-01-27         NaN
2022-01-28    0.010317
2022-01-29         NaN
2022-01-30         NaN
                ...   
2022-02-28         NaN
2022-03-01         NaN
2022-03-02   -0.284138
2022-03-03         NaN
2022-03-04         NaN
2022-03-05    0.417543
2022-03-06         NaN
2022-03-07         NaN
2022-03-08   -0.176598
2022-03-09         NaN
2022-03-10         NaN
2022-03-11    0.703380
2022-03-12         NaN
2022-03-13         NaN
2022-03-14    0.541354
2022-03-15         NaN
2022-03-16         NaN
2022-03-17   -0.080535
2022-03-18         NaN
2022-03-19         NaN
2022-03-20   -0.721775
2022-03-21         NaN
2022-03-22         NaN
2022-03-23    0.991998
2022-03-24         NaN
2022-03-25         NaN
2022-03-26    0.463137
2022-03-27         NaN
2022-03-28         NaN
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02    0.034804
2022-01-03         NaN
2022-01-04    0.751800
2022-01-05    0.751800
2022-01-06         NaN
2022-01-07   -0.331184
2022-01-08   -0.331184
2022-01-09         NaN
2022-01-10    0.411448
2022-01-11    0.411448
2022-01-12         NaN
2022-01-13   -0.207127
2022-01-14   -0.207127
2022-01-15         NaN
2022-01-16   -0.024277
2022-01-17   -0.024277
2022-01-18         NaN
2022-01-19   -0.071963
2022-01-20   -0.071963
2022-01-21         NaN
2022-01-22    0.016815
2022-01-23    0.016815
2022-01-24         NaN
2022-01-25   -0.598206
2022-01-26   -0.598206
2022-01-27         NaN
2022-01-28    0.010317
2022-01-29    0.010317
2022-01-30         NaN
                ...   
2022-02-28    0.035988
2022-03-01         NaN
2022-03-02   -0.284138
2022-03-03   -0.284138
2022-03-04         NaN
2022-03-05    0.417543
2022-03-06    0.417543
2022-03-07         NaN
2022-03-08   -0.176598
2022-03-09   -0.176598
2022-03-10         NaN
2022-03-11    0.703380
2022-03-12    0.703380
2022-03-13         NaN
2022-03-14    0.541354
2022-03-15    0.541354
2022-03-16         NaN
2022-03-17   -0.080535
2022-03-18   -0.080535
2022-03-19         NaN
2022-03-20   -0.721775
2022-03-21   -0.721775
2022-03-22         NaN
2022-03-23    0.991998
2022-03-24    0.991998
2022-03-25         NaN
2022-03-26    0.463137
2022-03-27    0.463137
2022-03-28         NaN
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02         NaN
2022-01-03    0.751800
2022-01-04    0.751800
2022-01-05         NaN
2022-01-06   -0.331184
2022-01-07   -0.331184
2022-01-08         NaN
2022-01-09    0.411448
2022-01-10    0.411448
2022-01-11         NaN
2022-01-12   -0.207127
2022-01-13   -0.207127
2022-01-14         NaN
2022-01-15   -0.024277
2022-01-16   -0.024277
2022-01-17         NaN
2022-01-18   -0.071963
2022-01-19   -0.071963
2022-01-20         NaN
2022-01-21    0.016815
2022-01-22    0.016815
2022-01-23         NaN
2022-01-24   -0.598206
2022-01-25   -0.598206
2022-01-26         NaN
2022-01-27    0.010317
2022-01-28    0.010317
2022-01-29         NaN
2022-01-30   -0.340948
                ...   
2022-02-28         NaN
2022-03-01   -0.284138
2022-03-02   -0.284138
2022-03-03         NaN
2022-03-04    0.417543
2022-03-05    0.417543
2022-03-06         NaN
2022-03-07   -0.176598
2022-03-08   -0.176598
2022-03-09         NaN
2022-03-10    0.703380
2022-03-11    0.703380
2022-03-12         NaN
2022-03-13    0.541354
2022-03-14    0.541354
2022-03-15         NaN
2022-03-16   -0.080535
2022-03-17   -0.080535
2022-03-18         NaN
2022-03-19   -0.721775
2022-03-20   -0.721775
2022-03-21         NaN
2022-03-22    0.991998
2022-03-23    0.991998
2022-03-24         NaN
2022-03-25    0.463137
2022-03-26    0.463137
2022-03-27         NaN
2022-03-28    0.291441
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02    0.273803
2022-01-03    0.512801
2022-01-04    0.751800
2022-01-05    0.390805
2022-01-06    0.029811
2022-01-07   -0.331184
2022-01-08   -0.083640
2022-01-09    0.163904
2022-01-10    0.411448
2022-01-11    0.205256
2022-01-12   -0.000935
2022-01-13   -0.207127
2022-01-14   -0.146177
2022-01-15   -0.085227
2022-01-16   -0.024277
2022-01-17   -0.040172
2022-01-18   -0.056068
2022-01-19   -0.071963
2022-01-20   -0.042371
2022-01-21   -0.012778
2022-01-22    0.016815
2022-01-23   -0.188192
2022-01-24   -0.393199
2022-01-25   -0.598206
2022-01-26   -0.395365
2022-01-27   -0.192524
2022-01-28    0.010317
2022-01-29   -0.106771
2022-01-30   -0.223859
                ...   
2022-02-28   -0.070721
2022-03-01   -0.177429
2022-03-02   -0.284138
2022-03-03   -0.050244
2022-03-04    0.183649
2022-03-05    0.417543
2022-03-06    0.219496
2022-03-07    0.021449
2022-03-08   -0.176598
2022-03-09    0.116728
2022-03-10    0.410054
2022-03-11    0.703380
2022-03-12    0.649371
2022-03-13    0.595363
2022-03-14    0.541354
2022-03-15    0.334058
2022-03-16    0.126762
2022-03-17   -0.080535
2022-03-18   -0.294281
2022-03-19   -0.508028
2022-03-20   -0.721775
2022-03-21   -0.150518
2022-03-22    0.420740
2022-03-23    0.991998
2022-03-24    0.815711
2022-03-25    0.639424
2022-03-26    0.463137
2022-03-27    0.405905
2022-03-28    0.348673
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################

六. 移动窗口函数

代码:

import matplotlib.pylab as plt
import numpy as np
import pandas as pd

# 生成时间序列
df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2022', freq = 'D', periods = 600))

# 使用window函数
r = df.rolling(window = 10)
# 输出最近10个值的平均值
print(print(r.mean()))


# 画图
plt.figure(figsize=(15, 5))

df.plot(style='r')
df.rolling(window=10).mean().plot(style='b')

plt.show()

测试记录:

image.png

参考:

  1. https://study.163.com/course/introduction.htm?courseId=1003590004#/courseDetail?tab=1

相关文章

网友评论

      本文标题:Python数据分析与机器学习44-Python生成时间序列

      本文链接:https://www.haomeiwen.com/subject/rfygirtx.html