Pandas时间序列相关知识点总结
data:image/s3,"s3://crabby-images/4a9ad/4a9ad3343ecaf3da34943730b252291a3ce460ed" alt=""
一、Pandas时刻数据:Timestamp
(1)Timestamp的基本概念
Timestamp时刻数据代表时间点,是pandas的数据类型,是将值与时间点相关联的最基本类型的时间序列数据
# pandas.Timestamp()
# 直接生成pandas的时刻数据 → 时间戳
# 数据类型为 pandas的Timestamp
# 时刻数据代表时间点,是pandas的数据类型,是将值与时间点相关联的最基本类型的时间序列数据
t1 = datetime.datetime(2019,2,13) # 创建一个datetime.datetime
t2 = pd.Timestamp(t1)
print(t1,type(t1))
print(t2,type(t2))
print(pd.Timestamp('2018-9-7 12:30:23'))
# t1是datetime.datetime的datetime类
# t2是pandas的Timestamp
# t1和t2两种输出结果一样,只是类型不同
输出结果:
2019-02-13 00:00:00 <class 'datetime.datetime'>
2019-02-13 00:00:00 <class 'pandas.tslib.Timestamp'>
2018-09-07 12:30:23
(2)pd.to_datetime转换时间数据
pd.to_datetime,可将单个时间数据,转换成pandas的时刻数据,数据类型为Timestamp;若是转换多个时间数据,将会转换为pandas的DatetimeIndex。
# pd.to_datetime
t1 = datetime.datetime(2019,1,2,3,45,56)
t2 = '2018-6-7'
print(t1,type(t1))
print(t2,type(t2))
# 用pd.to_datetime转换单个时间数据
# 单个时间数据,转换成pandas的时刻数据,数据类型为Timestamp
t3 = pd.to_datetime(t1)
t4 = pd.to_datetime(t2)
print(t3,type(t3))
print(t4,type(t4))
# 用pd.to_datetime转换多个时间数据
# 多个时间数据,将会转换为pandas的DatetimeIndex
lst_date = ['2017-2-3','2018-9-7','2010-7-6']
t5 = pd.to_datetime(lst_date)
print(t5,type(t5))
print('----------')
date3 = ['2017-2-1','2017-2-2','2017-2-3','hello world!','2017-2-5','2017-2-6']
t3 = pd.to_datetime(date3, errors = 'ignore')
print(t3,type(t3))
print('----------')
# 当一组时间序列中夹杂其他格式数据,可用errors参数返回
# errors = 'ignore':不可解析时返回原始输入,这里就是直接生成一般数组ndarray
t4 = pd.to_datetime(date3, errors = 'coerce')
print(t4,type(t4))
# errors = 'coerce':不可扩展,缺失值返回NaT(Not a Time),结果认为DatetimeInde
输出结果:
2019-01-02 03:45:56 <class 'datetime.datetime'>
2018-6-7 <class 'str'>
2019-01-02 03:45:56 <class 'pandas.tslib.Timestamp'>
2018-06-07 00:00:00 <class 'pandas.tslib.Timestamp'>
DatetimeIndex(['2017-02-03', '2018-09-07', '2010-07-06'], dtype='datetime64[ns]', freq=None) <class 'pandas.tseries.index.DatetimeIndex'>
----------
['2017-2-1' '2017-2-2' '2017-2-3' 'hello world!' '2017-2-5' '2017-2-6'] <class 'numpy.ndarray'>
----------
DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', 'NaT', '2017-02-05',
'2017-02-06'],
dtype='datetime64[ns]', freq=None) <class 'pandas.tseries.index.DatetimeIndex'>
二、Pandas时间戳索引:DatetimeIndex
(1)时间戳索引DatetimeIndex的基本概念
pd.DatetimeIndex(),可将字符串日期或datetime类直接生成时间戳索引DatetimeIndex。
而将DatetimeIndex作为Series的index的,称为TimeSries,是时间序列。
# Pandas时间戳索引:DatetimeIndex
# 用pd.DatetimeIndex(),直接生成时间戳索引DatetimeIndex
# 用pd.DatetimeIndex()生成的数据类型都为DatetimeIndex
rng1 = pd.DatetimeIndex([datetime.datetime(2017,3,4),datetime.datetime(2015,3,5),datetime.datetime(2010,4,16)])
print(rng1,type(rng1))
print(rng1[1],type(rng1[1]))
print('------------')
# pd.DatetimeIndex()括号内支持str、datetime.datetime
rng2 = pd.DatetimeIndex(['2009-10-19','2008-12-22'])
print(rng2,type(rng2))
print('----------')
# 以DatetimeIndex为index的Series,为TimeSries,时间序列
s = pd.Series(np.random.rand(len(rng1)),index=rng1)
print(s,type(s))
输出结果:
DatetimeIndex(['2017-03-04', '2015-03-05', '2010-04-16'], dtype='datetime64[ns]', freq=None) <class 'pandas.tseries.index.DatetimeIndex'>
2015-03-05 00:00:00 <class 'pandas.tslib.Timestamp'>
------------
DatetimeIndex(['2009-10-19', '2008-12-22'], dtype='datetime64[ns]', freq=None) <class 'pandas.tseries.index.DatetimeIndex'>
----------
2017-03-04 0.236450
2015-03-05 0.708165
2010-04-16 0.225343
dtype: float64 <class 'pandas.core.series.Series'>
(2)生成日期范围:pd.date_range()
【核心】用pd.date_range()生成日期范围
# pd.date_range()-日期范围:生成日期范围
# 2种生成方式:(1)start + end;(2)start/end + periods
# pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None)
# 默认频率:day
# start:开始时间
# end:结束时间
# periods:偏移量
# freq:频率,默认天,pd.date_range()默认频率为日历日,pd.bdate_range()默认频率为工作日
# tz:时区
# normalize=True 把手机直接归到凌晨0点0刻
# 方法①start + end生成日期范围,返回类型为DatetimeIndex
rng1 = pd.date_range(start = '2015-12-1',end = '2015-12-10')
print(rng1,type(rng1))
print('-----------------')
# 方法②start/end + periods生成日期范围,返回类型为DatetimeIndex
rng2 = pd.date_range(start = '2018-12-23',periods = 10)
print(rng2,type(rng2))
rng3 = pd.date_range(end = '2019-10-1',periods= 12)
print(rng3,type(rng3))
print('-----------------')
# 参数默认值:name=None和normalize=False
# normalize=True:时间参数值正则化到午夜时间戳(这里最后就直接变成0:00:00(没有表示出来),并不是15:30:00)
# name=名字:索引对象名称
rng4 = pd.date_range(start = '1/1/2017 15:30', periods = 10, name = 'hello world!', normalize = True)
print(rng4)
rng5 = pd.date_range(start = '1/1/2017 15:30', periods = 10, name = 'hello world!')
print(rng5)
print('-------')
# 参数closed:默认为None的情况下,左闭右闭,left则左闭右开,right则左开右闭
print(pd.date_range('20170101','20170104')) # 20170101也可读取
print(pd.date_range('20170101','20170104',closed = 'right'))
print(pd.date_range('20170101','20170104',closed = 'left'))
print('-------')
# pd.date_range()默认频率为日历日
# pd.bdate_range()默认频率为工作日
print(pd.bdate_range('20170101','20170107'))
print('-------------')
# 返回跳过了07日,表示07日是非工作日即系休息日
# 用list()将pd.date_range()转化为list,返回类型为list,list里面的元素为Timestamp
lst=list(pd.date_range(start = '1/1/2017', periods = 10))
print(lst)
print(lst[0],type(lst[0]))
输出结果:
DatetimeIndex(['2015-12-01', '2015-12-02', '2015-12-03', '2015-12-04',
'2015-12-05', '2015-12-06', '2015-12-07', '2015-12-08',
'2015-12-09', '2015-12-10'],
dtype='datetime64[ns]', freq='D') <class 'pandas.tseries.index.DatetimeIndex'>
-----------------
DatetimeIndex(['2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26',
'2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30',
'2018-12-31', '2019-01-01'],
dtype='datetime64[ns]', freq='D') <class 'pandas.tseries.index.DatetimeIndex'>
DatetimeIndex(['2019-09-20', '2019-09-21', '2019-09-22', '2019-09-23',
'2019-09-24', '2019-09-25', '2019-09-26', '2019-09-27',
'2019-09-28', '2019-09-29', '2019-09-30', '2019-10-01'],
dtype='datetime64[ns]', freq='D') <class 'pandas.tseries.index.DatetimeIndex'>
-----------------
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10'],
dtype='datetime64[ns]', name='hello world!', freq='D')
DatetimeIndex(['2017-01-01 15:30:00', '2017-01-02 15:30:00',
'2017-01-03 15:30:00', '2017-01-04 15:30:00',
'2017-01-05 15:30:00', '2017-01-06 15:30:00',
'2017-01-07 15:30:00', '2017-01-08 15:30:00',
'2017-01-09 15:30:00', '2017-01-10 15:30:00'],
dtype='datetime64[ns]', name='hello world!', freq='D')
-------
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='D')
-------
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
'2017-01-06'],
dtype='datetime64[ns]', freq='B')
-------------
[Timestamp('2017-01-01 00:00:00', offset='D'), Timestamp('2017-01-02 00:00:00', offset='D'), Timestamp('2017-01-03 00:00:00', offset='D'), Timestamp('2017-01-04 00:00:00', offset='D'), Timestamp('2017-01-05 00:00:00', offset='D'), Timestamp('2017-01-06 00:00:00', offset='D'), Timestamp('2017-01-07 00:00:00', offset='D'), Timestamp('2017-01-08 00:00:00', offset='D'), Timestamp('2017-01-09 00:00:00', offset='D'), Timestamp('2017-01-10 00:00:00', offset='D')]
2017-01-01 00:00:00 <class 'pandas.tslib.Timestamp'>
--->>> pd.date_range()日期范围的参数:频率设置(1)
print(pd.date_range('2017/1/1','2017/1/4')) # 默认freq = 'D':每日历日
print(pd.date_range('2017/1/1','2017/1/4', freq = 'B')) # B:每工作日
print(pd.date_range('2017/1/1','2017/1/2', freq = 'H')) # H:每小时
print(pd.date_range('2017/1/1 12:00','2017/1/1 12:10', freq = 'T')) # T/MIN:每分
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'S')) # S:每秒
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'L')) # L:每毫秒(千分之一秒)
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'U')) # U:每微秒(百万分之一秒)
print(pd.date_range('2017/1/1','2017/2/1', freq = 'W-MON'))
# W-MON:从指定星期几开始算起,每周
# 星期几缩写:MON/TUE/WED/THU/FRI/SAT/SUN
print(pd.date_range('2017/1/1','2017/5/1', freq = 'WOM-2MON'))
# WOM-2MON:每月的第几个星期几开始算,这里是每月第二个星期一
输出结果:
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='B')
DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 01:00:00',
'2017-01-01 02:00:00', '2017-01-01 03:00:00',
'2017-01-01 04:00:00', '2017-01-01 05:00:00',
'2017-01-01 06:00:00', '2017-01-01 07:00:00',
'2017-01-01 08:00:00', '2017-01-01 09:00:00',
'2017-01-01 10:00:00', '2017-01-01 11:00:00',
'2017-01-01 12:00:00', '2017-01-01 13:00:00',
'2017-01-01 14:00:00', '2017-01-01 15:00:00',
'2017-01-01 16:00:00', '2017-01-01 17:00:00',
'2017-01-01 18:00:00', '2017-01-01 19:00:00',
'2017-01-01 20:00:00', '2017-01-01 21:00:00',
'2017-01-01 22:00:00', '2017-01-01 23:00:00',
'2017-01-02 00:00:00'],
dtype='datetime64[ns]', freq='H')
DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:01:00',
'2017-01-01 12:02:00', '2017-01-01 12:03:00',
'2017-01-01 12:04:00', '2017-01-01 12:05:00',
'2017-01-01 12:06:00', '2017-01-01 12:07:00',
'2017-01-01 12:08:00', '2017-01-01 12:09:00',
'2017-01-01 12:10:00'],
dtype='datetime64[ns]', freq='T')
DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:00:01',
'2017-01-01 12:00:02', '2017-01-01 12:00:03',
'2017-01-01 12:00:04', '2017-01-01 12:00:05',
'2017-01-01 12:00:06', '2017-01-01 12:00:07',
'2017-01-01 12:00:08', '2017-01-01 12:00:09',
'2017-01-01 12:00:10'],
dtype='datetime64[ns]', freq='S')
DatetimeIndex([ '2017-01-01 12:00:00', '2017-01-01 12:00:00.001000',
'2017-01-01 12:00:00.002000', '2017-01-01 12:00:00.003000',
'2017-01-01 12:00:00.004000', '2017-01-01 12:00:00.005000',
'2017-01-01 12:00:00.006000', '2017-01-01 12:00:00.007000',
'2017-01-01 12:00:00.008000', '2017-01-01 12:00:00.009000',
...
'2017-01-01 12:00:09.991000', '2017-01-01 12:00:09.992000',
'2017-01-01 12:00:09.993000', '2017-01-01 12:00:09.994000',
'2017-01-01 12:00:09.995000', '2017-01-01 12:00:09.996000',
'2017-01-01 12:00:09.997000', '2017-01-01 12:00:09.998000',
'2017-01-01 12:00:09.999000', '2017-01-01 12:00:10'],
dtype='datetime64[ns]', length=10001, freq='L')
DatetimeIndex([ '2017-01-01 12:00:00', '2017-01-01 12:00:00.000001',
'2017-01-01 12:00:00.000002', '2017-01-01 12:00:00.000003',
'2017-01-01 12:00:00.000004', '2017-01-01 12:00:00.000005',
'2017-01-01 12:00:00.000006', '2017-01-01 12:00:00.000007',
'2017-01-01 12:00:00.000008', '2017-01-01 12:00:00.000009',
...
'2017-01-01 12:00:09.999991', '2017-01-01 12:00:09.999992',
'2017-01-01 12:00:09.999993', '2017-01-01 12:00:09.999994',
'2017-01-01 12:00:09.999995', '2017-01-01 12:00:09.999996',
'2017-01-01 12:00:09.999997', '2017-01-01 12:00:09.999998',
'2017-01-01 12:00:09.999999', '2017-01-01 12:00:10'],
dtype='datetime64[ns]', length=10000001, freq='U')
DatetimeIndex(['2017-01-02', '2017-01-09', '2017-01-16', '2017-01-23',
'2017-01-30'],
dtype='datetime64[ns]', freq='W-MON')
DatetimeIndex(['2017-01-09', '2017-02-13', '2017-03-13', '2017-04-10'], dtype='datetime64[ns]', freq='WOM-2MON')
--->>> pd.date_range()日期范围的参数:频率设置(2)
# pd.date_range()-日期范围:频率(2)
print(pd.date_range('2017','2018', freq = 'M'))
print(pd.date_range('2017','2020', freq = 'Q-DEC'))
print(pd.date_range('2017','2020', freq = 'A-DEC'))
print('------')
# M:每月最后一个日历日
# Q-月:指定月为季度末,每个季度末最后一月的最后一个日历日
# A-月:每年指定月份的最后一个日历日
# 月缩写:JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC
# 所以Q-月只有三种情况:1-4-7-10,2-5-8-11,3-6-9-12
print(pd.date_range('2017','2018', freq = 'BM'))
print(pd.date_range('2017','2020', freq = 'BQ-DEC'))
print(pd.date_range('2017','2020', freq = 'BA-DEC'))
print('------')
# BM:每月最后一个工作日
# BQ-月:指定月为季度末,每个季度末最后一月的最后一个工作日
# BA-月:每年指定月份的最后一个工作日
print(pd.date_range('2017','2018', freq = 'MS'))
print(pd.date_range('2017','2020', freq = 'QS-DEC'))
print(pd.date_range('2017','2020', freq = 'AS-DEC'))
print('------')
# M:每月第一个日历日
# Q-月:指定月为季度末,每个季度末最后一月的第一个日历日
# A-月:每年指定月份的第一个日历日
print(pd.date_range('2017','2018', freq = 'BMS'))
print(pd.date_range('2017','2020', freq = 'BQS-DEC'))
print(pd.date_range('2017','2020', freq = 'BAS-DEC'))
print('------')
# BM:每月第一个工作日
# BQ-月:指定月为季度末,每个季度末最后一月的第一个工作日
# BA-月:每年指定月份的第一个工作日
输出结果:
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
'2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
'2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
dtype='datetime64[ns]', freq='M')
DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-30', '2017-12-31',
'2018-03-31', '2018-06-30', '2018-09-30', '2018-12-31',
'2019-03-31', '2019-06-30', '2019-09-30', '2019-12-31'],
dtype='datetime64[ns]', freq='Q-DEC')
DatetimeIndex(['2017-12-31', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='A-DEC')
------
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-28',
'2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
'2017-09-29', '2017-10-31', '2017-11-30', '2017-12-29'],
dtype='datetime64[ns]', freq='BM')
DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-29', '2017-12-29',
'2018-03-30', '2018-06-29', '2018-09-28', '2018-12-31',
'2019-03-29', '2019-06-28', '2019-09-30', '2019-12-31'],
dtype='datetime64[ns]', freq='BQ-DEC')
DatetimeIndex(['2017-12-29', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='BA-DEC')
------
DatetimeIndex(['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01',
'2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01',
'2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01',
'2018-01-01'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01',
'2018-03-01', '2018-06-01', '2018-09-01', '2018-12-01',
'2019-03-01', '2019-06-01', '2019-09-01', '2019-12-01'],
dtype='datetime64[ns]', freq='QS-DEC')
DatetimeIndex(['2017-12-01', '2018-12-01', '2019-12-01'], dtype='datetime64[ns]', freq='AS-DEC')
------
DatetimeIndex(['2017-01-02', '2017-02-01', '2017-03-01', '2017-04-03',
'2017-05-01', '2017-06-01', '2017-07-03', '2017-08-01',
'2017-09-01', '2017-10-02', '2017-11-01', '2017-12-01',
'2018-01-01'],
dtype='datetime64[ns]', freq='BMS')
DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01',
'2018-03-01', '2018-06-01', '2018-09-03', '2018-12-03',
'2019-03-01', '2019-06-03', '2019-09-02', '2019-12-02'],
dtype='datetime64[ns]', freq='BQS-DEC')
DatetimeIndex(['2017-12-01', '2018-12-03', '2019-12-02'], dtype='datetime64[ns]', freq='BAS-DEC')
------
--->>> pd.date_range()日期范围的参数:复合频率
# pd.date_range()-日期范围:复合频率
print(pd.date_range('2017/1/1','2017/2/1', freq = '7D')) # 7天
print(pd.date_range('2017/1/1','2017/1/2', freq = '2h30min')) # 2小时30分钟
print(pd.date_range('2017','2018', freq = '2M')) # 2月,每月最后一个日历日
输出结果:
DatetimeIndex(['2017-01-01', '2017-01-08', '2017-01-15', '2017-01-22',
'2017-01-29'],
dtype='datetime64[ns]', freq='7D')
DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 02:30:00',
'2017-01-01 05:00:00', '2017-01-01 07:30:00',
'2017-01-01 10:00:00', '2017-01-01 12:30:00',
'2017-01-01 15:00:00', '2017-01-01 17:30:00',
'2017-01-01 20:00:00', '2017-01-01 22:30:00'],
dtype='datetime64[ns]', freq='150T')
DatetimeIndex(['2017-01-31', '2017-03-31', '2017-05-31', '2017-07-31',
'2017-09-30', '2017-11-30'],
dtype='datetime64[ns]', freq='2M')
--->>> pd.date_range()-日期范围:.shift(正数/负数) -- 超前/滞后数据
# pd.date_range()-日期范围:超前/滞后数据
# 用.shift(正数/负数)默认对数值进行超前/滞后移动
# 正数:数值后移(滞后);负数:数值前移(超前)
s = pd.Series(np.random.rand(5),index=pd.date_range('2018-1-1','2018-1-5'))
print(s)
print('--------------')
print(s.shift(1))
print('--------------')
# s的数据值后移(滞后)一位
print(s.shift(-1))
print('--------------')
# s的数据前移(超前)一位
# 设置参数freq,可对时间戳进行位移,而不是对数值进行位移
print(s.shift(2, freq = 'D'))
print(s.shift(2, freq = 'T'))
输出结果:
2018-01-01 0.428402
2018-01-02 0.021449
2018-01-03 0.844372
2018-01-04 0.803666
2018-01-05 0.564074
Freq: D, dtype: float64
--------------
2018-01-01 NaN
2018-01-02 0.428402
2018-01-03 0.021449
2018-01-04 0.844372
2018-01-05 0.803666
Freq: D, dtype: float64
--------------
2018-01-01 0.021449
2018-01-02 0.844372
2018-01-03 0.803666
2018-01-04 0.564074
2018-01-05 NaN
Freq: D, dtype: float64
--------------
2018-01-03 0.428402
2018-01-04 0.021449
2018-01-05 0.844372
2018-01-06 0.803666
2018-01-07 0.564074
Freq: D, dtype: float64
2018-01-01 00:02:00 0.428402
2018-01-02 00:02:00 0.021449
2018-01-03 00:02:00 0.844372
2018-01-04 00:02:00 0.803666
2018-01-05 00:02:00 0.564074
Freq: D, dtype: float64
三、Pandas时期:Period
(1)Period的基本概念
Period表示一个标准的时间段。例如某年、某月、某日、某小时等。时间的长短由freq决定。
# pd.Period()创建时期
p = pd.Period('2017', freq = 'M')
print(p, type(p))
# 生成一个以2017-01开始,月为频率的时间构造器
# pd.Period()参数:一个时间戳 + freq 参数 → freq 用于指明该 period 的长度,时间戳则说明该 period 在时间轴上的位置
print(p + 1)
print(p - 2)
print(pd.Period('2012', freq = 'A-DEC') - 1)
# 通过加减整数,将周期整体移动
# 这里是按照 月、年 移动
输出结果:
2017-01 <class 'pandas._period.Period'>
2017-02
2016-11
2011
(2)生成时期范围:pd.period_range()
创建时期范围:pd.period_range(start=None, end=None, periods=None, freq='D', name=None)
# pd.period_range(),生成时期范围,与pd.date_range()的功能类似
# pd.period_range()返回的数据类型——多个数据是PeriodIndex,单个数据是Period
# pd.period_range(start=None, end=None, periods=None, freq='D', name=None)
# 括号内参数必须指定开始、结束、和频率,至少要指定开始和结束两个参数
#
prng1 = pd.period_range('1/1/2011', '1/1/2012', freq='M')
print(prng1,type(prng1)) # 多个数据的类型查看
print(prng1[0],type(prng[0])) # 单个数据的类型查看
输出结果:
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
'2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
'2012-01'],
dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'>
2011-01 <class 'pandas._libs.tslibs.period.Period'>
--->>> 区别时期范围-pd.period_range()和日期范围-pd.date_range():
#区别时期范围-pd.period_range()和日期范围-pd.date_range():
# pd.date_range()日期范围创建
rng = pd.date_range('2019-12-1','2019-12-8',freq='2D')
print(rng,'\n','多个数据类型查看:',type(rng)) # 日期范围的多个数据类型查看
print(rng[0],'\n','单个数据类型查看',type(rng[0])) # 日期范围的单个数据类型查看
print('-------------')
# pd.period_range()时期范围创建
prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')
print(prng,'\n','多个数据类型查看:',type(prng)) # 时期范围的多个数据类型查看
print(prng[0],'\n','单个数据类型查看',type(prng[0])) # 时期范围的单个数据类型查看
# 总结区别:
# ① pd.date_range()日期范围返回数据类型——多个数据是DatetimeIndex,单个数据是Timestamp;
# ② pd.period_range()返回的数据类型——多个数据是PeriodIndex,单个数据是Period
# ③ Period是一个时期,是一个时间段!!但两者作为index时区别不大
输出结果:
DatetimeIndex(['2019-12-01', '2019-12-03', '2019-12-05', '2019-12-07'], dtype='datetime64[ns]', freq='2D')
多个数据类型查看: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
2019-12-01 00:00:00
单个数据类型查看 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
-------------
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
'2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
'2012-01'],
dtype='period[M]', freq='M')
多个数据类型查看: <class 'pandas.core.indexes.period.PeriodIndex'>
2011-01
单个数据类型查看 <class 'pandas._libs.tslibs.period.Period'>
(3)频率转换:asfreq
# asfreq:频率转换
# 通过.asfreq(freq, method=None, how=None)方法转换成别的频率
# 参数method:插值模式,为None不插值,为ffill用之前值填充,bfill用之后值填充
# 参数how,表示从时期的频率长度的起点开始,还是结尾开始
p = pd.Period('2017','A-DEC')
print(p)
print(p.asfreq('M', how = 'start')) # 也可写 how = 's',表示取2017年以月为频率的开始
print(p.asfreq('D', how = 'end')) # 也可写 how = 'e',表示取2017年以天为频率的结束
输出结果:
2017
2017-01
2017-12-31
(4)时间戳与时期之间的换:.to_period()与.to_timestamp()
# 时间戳与时期之间的转换:.to_period()、.to_timestamp()
rng = pd.date_range('2018/1/1', periods = 10, freq = 'M') # 创建日期范围,频率为月
prng = pd.period_range('2017','2018', freq = 'M') # 创建时期范围,频率为月
print(rng)
print(prng)
print('--------------------')
t1 = rng.to_period()
print(t1,type(t1)) # 日期范围转换为时期范围,由每月最后一日转换为每月
print('--------------------')
t2 = prng.to_timestamp()
print(t2,type(t2)) # 时期范围转换为日期范围,由每月转换为每月第一日
输出结果:
pyhton
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
'2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
'2018-09-30', '2018-10-31'],
dtype='datetime64[ns]', freq='M')
PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
'2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12',
'2018-01'],
dtype='period[M]', freq='M')
--------------------
PeriodIndex(['2018-01', '2018-02', '2018-03', '2018-04', '2018-05', '2018-06',
'2018-07', '2018-08', '2018-09', '2018-10'],
dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'>
--------------------
DatetimeIndex(['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01',
'2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01',
'2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01',
'2018-01-01'],
dtype='datetime64[ns]', freq='MS') <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
四、时间序列TimeSeries - 索引及切片
(1)时间序列基本概念
以DatetimeIndex为index的Series,为TimeSries,是时间序列。TimeSeries是Series的一个子类,所以与Series索引及数据选取方面的方法基本一样;同时TimeSeries通过时间序列有更便捷的方法做索引和切片。
# 创建时间序列
rng = pd.date_range('2018-01-01',periods=5) # 创建日期范围
print(rng,type(rng))
ts = pd.Series(np.random.rand(5),index=rng) # 用pd.Series()创建时间序列,rng(DatetimeIndex)作为Series的index
print(ts,type(ts))
print('----------------------')
prng = pd.period_range('2017','2018',freq='M') # 创建时期范围
print(prng,type(prng))
ts =pd.Series(np.random.rand(13),index=prng) # 用pd.Series()创建时间序列,prng(PeriodIndex)作为Series的index
print(ts,type(ts))
print('----------------------')
# 频率转换,asfreq也可以转换TimeSeries的index
prng2 = pd.period_range('2017','2018',freq = 'M') # 时期范围
print(prng2,type(prng2))
ts2 = pd.Series(np.random.rand(len(prng)), index = prng2.asfreq('D', how = 'start')) # 将时期范围作为index的同时转换为日期范围
print(ts2.head())
print(ts2[0:2],type(ts[0:2]))
输出结果:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05'],
dtype='datetime64[ns]', freq='D') <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
2018-01-01 0.788017
2018-01-02 0.215820
2018-01-03 0.655670
2018-01-04 0.757808
2018-01-05 0.870577
Freq: D, dtype: float64 <class 'pandas.core.series.Series'>
----------------------
PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
'2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12',
'2018-01'],
dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'>
2017-01 0.382242
2017-02 0.275827
2017-03 0.554054
2017-04 0.657547
2017-05 0.111228
2017-06 0.936175
2017-07 0.956504
2017-08 0.144461
2017-09 0.269050
2017-10 0.187785
2017-11 0.504171
2017-12 0.821249
2018-01 0.646382
Freq: M, dtype: float64 <class 'pandas.core.series.Series'>
----------------------
PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
'2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12',
'2018-01'],
dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'>
2017-01-01 0.468054
2017-02-01 0.047046
2017-03-01 0.693999
2017-04-01 0.446583
2017-05-01 0.457438
Freq: D, dtype: float64
2017-01-01 0.468054
2017-02-01 0.047046
Freq: D, dtype: float64 <class 'pandas.core.series.Series'>
(2)时间序列的索引
# 时间序列的索引
# 创建时间序列
ts = pd.Series(np.random.rand(10),index=pd.date_range('2018-02-11','2018-02-20'))
print(ts,type(ts))
# 时间序列索引-下标索引定位
print(ts[0],type(ts[0])) # 定位索引为0的行
print(ts[-1],type(ts[-1])) # 定位最后一行
print('---------------------------')
# 时间序列索引-index标签索引,支持各种时间字符串,以及datetime.datetime
print(ts['2018-2-12'],type(ts['2018-2-12']))
print(ts['2018/02/20'])
print(ts[datetime.datetime(2018,2,15)])
# 时间序列由于按照时间先后排序,故不用考虑顺序问题
# 索引方法同样适用于Dataframe
输出结果:
2018-02-11 0.025897
2018-02-12 0.140325
2018-02-13 0.850016
2018-02-14 0.361940
2018-02-15 0.551185
2018-02-16 0.169641
2018-02-17 0.730667
2018-02-18 0.349231
2018-02-19 0.279436
2018-02-20 0.201236
Freq: D, dtype: float64 <class 'pandas.core.series.Series'>
0.025896682519807368 <class 'numpy.float64'>
0.20123590019367488 <class 'numpy.float64'>
---------------------------
0.14032497534971577 <class 'numpy.float64'>
0.20123590019367488
0.5511852741728931
(3)时间序列的切片
# 时间序列的切片
# 创建时间序列
rng = pd.date_range('2017-1','2017-3',freq='12H')
ts = pd.Series(np.random.rand(len(rng)),index=rng)
print(ts.head(),type(ts))
print('---------------------------')
# 下标索引切片,与list类似,末端不包含
print(ts[3:6]) # 定位索引为3-5的行
print(ts[1:10:3]) # 定位索引从1-10的行,且按步长为3输出
print('---------------------------')
# index标签索引切片,和Series按照index索引原理一样,也是末端包含
print(ts['2017/1/5':'2017/1/10'])
print('---------------------------')
# 传入月,直接得到一个切片
print(ts['2017-1'].head())
输出结果:
2017-01-01 00:00:00 0.796282
2017-01-01 12:00:00 0.365246
2017-01-02 00:00:00 0.846448
2017-01-02 12:00:00 0.042210
2017-01-03 00:00:00 0.726811
Freq: 12H, dtype: float64 <class 'pandas.core.series.Series'>
---------------------------
2017-01-02 12:00:00 0.042210
2017-01-03 00:00:00 0.726811
2017-01-03 12:00:00 0.952443
Freq: 12H, dtype: float64
2017-01-01 12:00:00 0.365246
2017-01-03 00:00:00 0.726811
2017-01-04 12:00:00 0.668528
Freq: 36H, dtype: float64
---------------------------
2017-01-05 00:00:00 0.658508
2017-01-05 12:00:00 0.690507
2017-01-06 00:00:00 0.160211
2017-01-06 12:00:00 0.676604
2017-01-07 00:00:00 0.851656
2017-01-07 12:00:00 0.327676
2017-01-08 00:00:00 0.469967
2017-01-08 12:00:00 0.192259
2017-01-09 00:00:00 0.771091
2017-01-09 12:00:00 0.840505
2017-01-10 00:00:00 0.373256
2017-01-10 12:00:00 0.176877
Freq: 12H, dtype: float64
---------------------------
2017-01-01 00:00:00 0.796282
2017-01-01 12:00:00 0.365246
2017-01-02 00:00:00 0.846448
2017-01-02 12:00:00 0.042210
2017-01-03 00:00:00 0.726811
Freq: 12H, dtype: float64
(4)索引有重复的时间序列
# 索引有重复的时间序列
dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015'])
ts = pd.Series(np.random.rand(6), index = dates)
print(ts) # ts时间序列的index有重复的,2015-01-01和2015-01-02
# 索引重复的时间序列
# index有重复的将返回多个值,那么数据类型就是Series
print(ts['20150101'],type(ts['20150101']))
print(ts['20150102'],type(ts['20150102']))
print('-------------------------------------')
# 重复数据的判断
# .is_unique,判断是否唯一
print(ts.is_unique) # 判断值values是否唯一
print(ts.index.is_unique) # 判断index是否唯一
# 处理重复
# 通过groupby做分组,重复的值这里用平均值处理
print(ts.groupby(level=0).mean())
# 就是将有两个值的index=2015-01-01,取这两个值的平均数
输出结果:
2015-01-01 0.966322
2015-01-02 0.020092
2015-01-03 0.178119
2015-01-04 0.537812
2015-01-01 0.717608
2015-01-02 0.503463
dtype: float64
2015-01-01 0.966322
2015-01-01 0.717608
dtype: float64 <class 'pandas.core.series.Series'>
2015-01-02 0.020092
2015-01-02 0.503463
dtype: float64 <class 'pandas.core.series.Series'>
-------------------------------------
True
False
2015-01-01 0.841965
2015-01-02 0.261777
2015-01-03 0.178119
2015-01-04 0.537812
dtype: float64
五、时间序列-重采样
时间序列重采样就是将时间序列从一个频率转换为另一个频率的过程,且会有数据的结合。时间序列重采样分为降采样和升采样。
(1)降采样
降采样,是指高频数据 → 低频数据,例如,以天为频率的数据转为以月为频率的数据,365天转为12个月,即是数据由多变少;
# 重采样:.resample()
# 创建一个以天为频率的TimeSeries,以天为频率 → 降采样为按5天为频率
rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(12), index = rng)
print(ts)
print('---------------------')
# 对ts进行降采样
ts_re = ts.resample('5D')
print(ts_re,type(ts_re))
# ts.resample('5D'),频率改为5天,返回一个重采样构造器,数据类型为;DatetimeIndexResampler;
ts_re2 = ts.resample('5D').sum()
print(ts_re2,type(ts_re2))
# 现在用.sum()求和作为聚合方法,最后返回一个新的聚合后的Series;
# 其他聚合方法
print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')
# OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘
# 参数closed的定义
# closed:表示各时间段哪一端是闭合(即包含)的,closed=None 默认 左闭右闭
# 详解:这里values为0-11,按照5D重采样 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# left指定间隔左边为结束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]
print(ts.resample('5D', closed = 'left').sum(),'→ left\n')
print(ts.resample('5D', closed = 'right').sum(),'→ right\n')
print('------------------------------')
# 参数label的定义
# label:表示聚合值的index选取左边还是右边,label=None 默认为取左
print(ts.resample('5D', label = 'left').sum(),'→ leftlabel\n')
print(ts.resample('5D', label = 'right').sum(),'→ rightlabel\n')
输出结果:
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-01-04 3
2017-01-05 4
2017-01-06 5
2017-01-07 6
2017-01-08 7
2017-01-09 8
2017-01-10 9
2017-01-11 10
2017-01-12 11
Freq: D, dtype: int32
---------------------
DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0] <class 'pandas.core.resample.DatetimeIndexResampler'>
2017-01-01 10
2017-01-06 35
2017-01-11 21
dtype: int32 <class 'pandas.core.series.Series'>
2017-01-01 2.0
2017-01-06 7.0
2017-01-11 10.5
dtype: float64 → 求平均值
2017-01-01 4
2017-01-06 9
2017-01-11 11
dtype: int32 → 求最大值
2017-01-01 0
2017-01-06 5
2017-01-11 10
dtype: int32 → 求最小值
2017-01-01 2.0
2017-01-06 7.0
2017-01-11 10.5
dtype: float64 → 求中值
2017-01-01 0
2017-01-06 5
2017-01-11 10
dtype: int32 → 返回第一个值
2017-01-01 4
2017-01-06 9
2017-01-11 11
dtype: int32 → 返回最后一个值
open high low close
2017-01-01 0 4 0 4
2017-01-06 5 9 5 9
2017-01-11 10 11 10 11 → OHLC重采样
2017-01-01 10
2017-01-06 35
2017-01-11 21
dtype: int32 → left
2016-12-27 0
2017-01-01 15
2017-01-06 40
2017-01-11 11
dtype: int32 → right
------------------------------
2017-01-01 10
2017-01-06 35
2017-01-11 21
dtype: int32 → leftlabel
2017-01-06 10
2017-01-11 35
2017-01-16 21
dtype: int32 → rightlabel
(2)升采样
升采样,是指低频数据 → 高频数据,例如,以年为频率的数据转为以月为频率的数据,1年转为12个月,即是数据由少边多。
# 升采样及插值
# 升采样主要是低频转高频,要如何插值的问题
# 升采样就是数据少变多的过程
# 创建一个日期范围,偏移量为5,频率为小时的TimeSeries
rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H')
ts = pd.DataFrame(np.arange(15).reshape(5,3),
index = rng,
columns = ['a','b','c'])
print(ts)
print('--------------------------------')
# 对ts进行升采样,由频率为小时,转变为频率为分钟
# 升采样的时候会产生很多NaN需要进行插值
print(ts.resample('15T').asfreq())
print(ts.resample('15T').ffill())
print(ts.resample('15T').bfill())
# .asfreq():不做填充,返回Nan
# .ffill():用前一个值,向下填充
# .bfill():用后一个值,向上填充
输出结果:
a b c
2017-01-01 00:00:00 0 1 2
2017-01-01 01:00:00 3 4 5
2017-01-01 02:00:00 6 7 8
2017-01-01 03:00:00 9 10 11
2017-01-01 04:00:00 12 13 14
--------------------------------
a b c
2017-01-01 00:00:00 0.0 1.0 2.0
2017-01-01 00:15:00 NaN NaN NaN
2017-01-01 00:30:00 NaN NaN NaN
2017-01-01 00:45:00 NaN NaN NaN
2017-01-01 01:00:00 3.0 4.0 5.0
2017-01-01 01:15:00 NaN NaN NaN
2017-01-01 01:30:00 NaN NaN NaN
2017-01-01 01:45:00 NaN NaN NaN
2017-01-01 02:00:00 6.0 7.0 8.0
2017-01-01 02:15:00 NaN NaN NaN
2017-01-01 02:30:00 NaN NaN NaN
2017-01-01 02:45:00 NaN NaN NaN
2017-01-01 03:00:00 9.0 10.0 11.0
2017-01-01 03:15:00 NaN NaN NaN
2017-01-01 03:30:00 NaN NaN NaN
2017-01-01 03:45:00 NaN NaN NaN
2017-01-01 04:00:00 12.0 13.0 14.0
a b c
2017-01-01 00:00:00 0 1 2
2017-01-01 00:15:00 0 1 2
2017-01-01 00:30:00 0 1 2
2017-01-01 00:45:00 0 1 2
2017-01-01 01:00:00 3 4 5
2017-01-01 01:15:00 3 4 5
2017-01-01 01:30:00 3 4 5
2017-01-01 01:45:00 3 4 5
2017-01-01 02:00:00 6 7 8
2017-01-01 02:15:00 6 7 8
2017-01-01 02:30:00 6 7 8
2017-01-01 02:45:00 6 7 8
2017-01-01 03:00:00 9 10 11
2017-01-01 03:15:00 9 10 11
2017-01-01 03:30:00 9 10 11
2017-01-01 03:45:00 9 10 11
2017-01-01 04:00:00 12 13 14
a b c
2017-01-01 00:00:00 0 1 2
2017-01-01 00:15:00 3 4 5
2017-01-01 00:30:00 3 4 5
2017-01-01 00:45:00 3 4 5
2017-01-01 01:00:00 3 4 5
2017-01-01 01:15:00 6 7 8
2017-01-01 01:30:00 6 7 8
2017-01-01 01:45:00 6 7 8
2017-01-01 02:00:00 6 7 8
2017-01-01 02:15:00 9 10 11
2017-01-01 02:30:00 9 10 11
2017-01-01 02:45:00 9 10 11
2017-01-01 03:00:00 9 10 11
2017-01-01 03:15:00 12 13 14
2017-01-01 03:30:00 12 13 14
2017-01-01 03:45:00 12 13 14
2017-01-01 04:00:00 12 13 14
(3)时期重采样
# 时期重采样 - Period
import pandas as pd
import numpy as np
prng = pd.period_range('2016','2017',freq = 'M')
ts = pd.Series(np.arange(len(prng)), index = prng)
print(ts)
print(ts.resample('3M').sum()) # 降采样
print(ts.resample('15D').ffill()) # 升采样
网友评论