时区处理
使用的第三方库为pytz
- 获取时区对象:pytz.timezone
- 本地化:tz_localize
- 时区转换:tz_convert
- tz_localize和tz_convert也是DatetimeIndex的实例⽅法
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# 导入包,查看最后的5个时区
import pytz
pytz.common_timezones[-5:]
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
tz = pytz.timezone("America/New_York")
tz
<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>
# pandas中的时间序列是单纯的(naive)时区
rng = pd.date_range("3/9/2012 9:20", periods=6, freq="D")
rng
DatetimeIndex(['2012-03-09 09:20:00', '2012-03-10 09:20:00',
'2012-03-11 09:20:00', '2012-03-12 09:20:00',
'2012-03-13 09:20:00', '2012-03-14 09:20:00'],
dtype='datetime64[ns]', freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
2012-03-09 09:20:00 1.108616
2012-03-10 09:20:00 -0.476701
2012-03-11 09:20:00 0.224237
2012-03-12 09:20:00 0.131083
2012-03-13 09:20:00 1.906484
2012-03-14 09:20:00 -0.632415
Freq: D, dtype: float64
# 利用时区生成日期范围
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')
DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
'2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
'2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
'2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
'2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
ts_utc = ts.tz_localize("UTC")
ts_utc
2012-03-09 09:20:00+00:00 1.108616
2012-03-10 09:20:00+00:00 -0.476701
2012-03-11 09:20:00+00:00 0.224237
2012-03-12 09:20:00+00:00 0.131083
2012-03-13 09:20:00+00:00 1.906484
2012-03-14 09:20:00+00:00 -0.632415
Freq: D, dtype: float64
ts_utc.index
DatetimeIndex(['2012-03-09 09:20:00+00:00', '2012-03-10 09:20:00+00:00',
'2012-03-11 09:20:00+00:00', '2012-03-12 09:20:00+00:00',
'2012-03-13 09:20:00+00:00', '2012-03-14 09:20:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
# 从本地时区转换到其他时区:tz_convert
ts_utc.tz_convert("America/New_York")
2012-03-09 04:20:00-05:00 1.108616
2012-03-10 04:20:00-05:00 -0.476701
2012-03-11 05:20:00-04:00 0.224237
2012-03-12 05:20:00-04:00 0.131083
2012-03-13 05:20:00-04:00 1.906484
2012-03-14 05:20:00-04:00 -0.632415
Freq: D, dtype: float64
# tz_localize和tz_convert也是DatetimeIndex的实例⽅法:
ts.index.tz_localize('Asia/Shanghai')
DatetimeIndex(['2012-03-09 09:20:00+08:00', '2012-03-10 09:20:00+08:00',
'2012-03-11 09:20:00+08:00', '2012-03-12 09:20:00+08:00',
'2012-03-13 09:20:00+08:00', '2012-03-14 09:20:00+08:00'],
dtype='datetime64[ns, Asia/Shanghai]', freq='D')
不同时区间的运算
- 时间序列不同的时区,合并在一起,最终是UTC
- 时间戳是以UTC形式存储的
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
2012-03-07 09:30:00 -1.109190
2012-03-08 09:30:00 -0.207785
2012-03-09 09:30:00 0.624029
2012-03-12 09:30:00 0.433870
2012-03-13 09:30:00 -0.485877
2012-03-14 09:30:00 -0.936569
2012-03-15 09:30:00 -0.866577
2012-03-16 09:30:00 1.819173
2012-03-19 09:30:00 0.204572
2012-03-20 09:30:00 -0.565101
Freq: B, dtype: float64
ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
result = ts1 + ts2
result
2012-03-07 09:30:00+00:00 NaN
2012-03-08 09:30:00+00:00 NaN
2012-03-09 09:30:00+00:00 1.248058
2012-03-12 09:30:00+00:00 0.867740
2012-03-13 09:30:00+00:00 -0.971755
2012-03-14 09:30:00+00:00 -1.873137
2012-03-15 09:30:00+00:00 -1.733153
Freq: B, dtype: float64
result.index
DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
'2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
'2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
'2012-03-15 09:30:00+00:00'],
dtype='datetime64[ns, UTC]', freq='B')
时期及运算
- 时期表示的是时间区间等
- Period类表示的这种数据结构
p = pd.Period(2016, freq="A-DEC")
p
Period('2016', 'A-DEC')
p + 3
Period('2019', 'A-DEC')
p - 2
Period('2014', 'A-DEC')
# 两个Period拥有相同的对象,它们的差是单位数量
pd.Period("2020", freq="A-DEC") - p
<4 * YearEnds: month=12>
# 创建规则的时期范围
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')
Period和PeriodIndex对象
- 通过asfreq转成别的频率
- 需要使用pd.Period()
p = pd.Period("2018", freq="A-DEC")
p
Period('2018', 'A-DEC')
p.asfreq("M", how="start")
Period('2018-01', 'M')
p.asfreq("M", how="end")
Period('2018-12', 'M')
p = pd.Period('2007', freq='A-JUN')
p
Period('2007', 'A-JUN')
p.asfreq('M', 'start')
Period('2006-07', 'M')
p.asfreq('M', 'end')
Period('2007-06', 'M')
将Timestamp转换为Period
- 通过to_period:将时间戳索引的S和DF对象转化为时期索引
- 通过to_timestamp:转回时间戳的格式
rng = pd.date_range("2018-01-01", periods=3)
rng
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq='D')
ts = pd.Series(np.random.randn(3), index=rng)
ts
2018-01-01 -0.146067
2018-01-02 0.815443
2018-01-03 0.416382
Freq: D, dtype: float64
pts = ts.to_period()
pts
2018-01-01 -0.146067
2018-01-02 0.815443
2018-01-03 0.416382
Freq: D, dtype: float64
rng = pd.date_range('1/29/2000', periods=6, freq='D')
ts2 = pd.Series(np.random.randn(6), index=rng)
ts2
2000-01-29 0.267670
2000-01-30 0.844309
2000-01-31 -2.875965
2000-02-01 0.005687
2000-02-02 -0.450650
2000-02-03 0.650101
Freq: D, dtype: float64
ts2.to_period("M")
2000-01 0.267670
2000-01 0.844309
2000-01 -2.875965
2000-02 0.005687
2000-02 -0.450650
2000-02 0.650101
Freq: M, dtype: float64
pts = ts2.to_period()
pts
2000-01-29 0.267670
2000-01-30 0.844309
2000-01-31 -2.875965
2000-02-01 0.005687
2000-02-02 -0.450650
2000-02-03 0.650101
Freq: D, dtype: float64
pts.to_timestamp(how="end")
2000-01-29 23:59:59.999999999 0.267670
2000-01-30 23:59:59.999999999 0.844309
2000-01-31 23:59:59.999999999 -2.875965
2000-02-01 23:59:59.999999999 0.005687
2000-02-02 23:59:59.999999999 -0.450650
2000-02-03 23:59:59.999999999 0.650101
Freq: D, dtype: float64
重采样及频率转换
重采样:将时间序列从一个频率转到另一个频率的处理过程。
降采样:将高频率数据聚合到低频率的过程
升采用:从低频率转换到高频率的过程
不是绝对的划分:W-WED--->W-FRI
函数使用的是resample方法
# 从起始时间开始,建立100个,以天D为频率
rng = pd.date_range('2000-01-01', periods=100, freq='D')
# len(rng)就是100,相当于是生成0-1之间的100个随机数
# 将rng当做索引值
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head(6)
2000-01-01 2.511154
2000-01-02 -1.533321
2000-01-03 -1.945515
2000-01-04 -0.235927
2000-01-05 2.488850
2000-01-06 -0.176643
Freq: D, dtype: float64
ts.resample('M').mean()
2000-01-31 0.003100
2000-02-29 -0.302630
2000-03-31 -0.205999
2000-04-30 0.618526
Freq: M, dtype: float64
ts.resample("M", kind="period").mean()
2000-01 0.003100
2000-02 -0.302630
2000-03 -0.205999
2000-04 0.618526
Freq: M, dtype: float64
OHLC重采样
金融领域常用的时间序列聚合方式,计算面元的四个值:
- open,开盘
- close,收盘
- high,最高
- low,最低
网友评论