美文网首页呆鸟的Python数据分析
《利用Python进行数据分析11》-时间序列(2)

《利用Python进行数据分析11》-时间序列(2)

作者: 皮皮大 | 来源:发表于2019-08-09 11:11 被阅读0次

时区处理

使用的第三方库为pytz

  • 获取时区对象:pytz.timezone
  • 本地化:tz_localize
  • 时区转换:tz_convert
  • tz_localize和tz_convert也是DatetimeIndex的实例⽅法
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# 导入包,查看最后的5个时区
import pytz
pytz.common_timezones[-5:]
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
tz = pytz.timezone("America/New_York")
tz
<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>
# pandas中的时间序列是单纯的(naive)时区
rng = pd.date_range("3/9/2012 9:20", periods=6, freq="D")
rng
DatetimeIndex(['2012-03-09 09:20:00', '2012-03-10 09:20:00',
               '2012-03-11 09:20:00', '2012-03-12 09:20:00',
               '2012-03-13 09:20:00', '2012-03-14 09:20:00'],
              dtype='datetime64[ns]', freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
2012-03-09 09:20:00    1.108616
2012-03-10 09:20:00   -0.476701
2012-03-11 09:20:00    0.224237
2012-03-12 09:20:00    0.131083
2012-03-13 09:20:00    1.906484
2012-03-14 09:20:00   -0.632415
Freq: D, dtype: float64
# 利用时区生成日期范围
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')
DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')
ts_utc = ts.tz_localize("UTC")
ts_utc
2012-03-09 09:20:00+00:00    1.108616
2012-03-10 09:20:00+00:00   -0.476701
2012-03-11 09:20:00+00:00    0.224237
2012-03-12 09:20:00+00:00    0.131083
2012-03-13 09:20:00+00:00    1.906484
2012-03-14 09:20:00+00:00   -0.632415
Freq: D, dtype: float64
ts_utc.index
DatetimeIndex(['2012-03-09 09:20:00+00:00', '2012-03-10 09:20:00+00:00',
               '2012-03-11 09:20:00+00:00', '2012-03-12 09:20:00+00:00',
               '2012-03-13 09:20:00+00:00', '2012-03-14 09:20:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')
# 从本地时区转换到其他时区:tz_convert
ts_utc.tz_convert("America/New_York")
2012-03-09 04:20:00-05:00    1.108616
2012-03-10 04:20:00-05:00   -0.476701
2012-03-11 05:20:00-04:00    0.224237
2012-03-12 05:20:00-04:00    0.131083
2012-03-13 05:20:00-04:00    1.906484
2012-03-14 05:20:00-04:00   -0.632415
Freq: D, dtype: float64
# tz_localize和tz_convert也是DatetimeIndex的实例⽅法:
ts.index.tz_localize('Asia/Shanghai')
DatetimeIndex(['2012-03-09 09:20:00+08:00', '2012-03-10 09:20:00+08:00',
               '2012-03-11 09:20:00+08:00', '2012-03-12 09:20:00+08:00',
               '2012-03-13 09:20:00+08:00', '2012-03-14 09:20:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq='D')

不同时区间的运算

  • 时间序列不同的时区,合并在一起,最终是UTC
  • 时间戳是以UTC形式存储的
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
2012-03-07 09:30:00   -1.109190
2012-03-08 09:30:00   -0.207785
2012-03-09 09:30:00    0.624029
2012-03-12 09:30:00    0.433870
2012-03-13 09:30:00   -0.485877
2012-03-14 09:30:00   -0.936569
2012-03-15 09:30:00   -0.866577
2012-03-16 09:30:00    1.819173
2012-03-19 09:30:00    0.204572
2012-03-20 09:30:00   -0.565101
Freq: B, dtype: float64
ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
result = ts1 + ts2
result
2012-03-07 09:30:00+00:00         NaN
2012-03-08 09:30:00+00:00         NaN
2012-03-09 09:30:00+00:00    1.248058
2012-03-12 09:30:00+00:00    0.867740
2012-03-13 09:30:00+00:00   -0.971755
2012-03-14 09:30:00+00:00   -1.873137
2012-03-15 09:30:00+00:00   -1.733153
Freq: B, dtype: float64
result.index
DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

时期及运算

  • 时期表示的是时间区间等
  • Period类表示的这种数据结构
p = pd.Period(2016, freq="A-DEC")
p
Period('2016', 'A-DEC')
p + 3
Period('2019', 'A-DEC')
p - 2
Period('2014', 'A-DEC')
# 两个Period拥有相同的对象,它们的差是单位数量
pd.Period("2020", freq="A-DEC") - p 
<4 * YearEnds: month=12>
# 创建规则的时期范围
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

Period和PeriodIndex对象

  • 通过asfreq转成别的频率
  • 需要使用pd.Period()
p = pd.Period("2018", freq="A-DEC")
p
Period('2018', 'A-DEC')
p.asfreq("M", how="start")
Period('2018-01', 'M')
p.asfreq("M", how="end")
Period('2018-12', 'M')
p = pd.Period('2007', freq='A-JUN')
p
Period('2007', 'A-JUN')
p.asfreq('M', 'start')
Period('2006-07', 'M')
p.asfreq('M', 'end')
Period('2007-06', 'M')

将Timestamp转换为Period

  • 通过to_period:将时间戳索引的S和DF对象转化为时期索引
  • 通过to_timestamp:转回时间戳的格式
rng = pd.date_range("2018-01-01", periods=3)
rng
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq='D')
ts = pd.Series(np.random.randn(3), index=rng)
ts
2018-01-01   -0.146067
2018-01-02    0.815443
2018-01-03    0.416382
Freq: D, dtype: float64
pts = ts.to_period()
pts
2018-01-01   -0.146067
2018-01-02    0.815443
2018-01-03    0.416382
Freq: D, dtype: float64
rng = pd.date_range('1/29/2000', periods=6, freq='D')
ts2 = pd.Series(np.random.randn(6), index=rng)
ts2
2000-01-29    0.267670
2000-01-30    0.844309
2000-01-31   -2.875965
2000-02-01    0.005687
2000-02-02   -0.450650
2000-02-03    0.650101
Freq: D, dtype: float64
ts2.to_period("M")
2000-01    0.267670
2000-01    0.844309
2000-01   -2.875965
2000-02    0.005687
2000-02   -0.450650
2000-02    0.650101
Freq: M, dtype: float64
pts = ts2.to_period()
pts
2000-01-29    0.267670
2000-01-30    0.844309
2000-01-31   -2.875965
2000-02-01    0.005687
2000-02-02   -0.450650
2000-02-03    0.650101
Freq: D, dtype: float64
pts.to_timestamp(how="end")
2000-01-29 23:59:59.999999999    0.267670
2000-01-30 23:59:59.999999999    0.844309
2000-01-31 23:59:59.999999999   -2.875965
2000-02-01 23:59:59.999999999    0.005687
2000-02-02 23:59:59.999999999   -0.450650
2000-02-03 23:59:59.999999999    0.650101
Freq: D, dtype: float64

重采样及频率转换

重采样:将时间序列从一个频率转到另一个频率的处理过程。

降采样:将高频率数据聚合到低频率的过程

升采用:从低频率转换到高频率的过程

不是绝对的划分:W-WED--->W-FRI

函数使用的是resample方法

# 从起始时间开始,建立100个,以天D为频率
rng = pd.date_range('2000-01-01', periods=100, freq='D')
# len(rng)就是100,相当于是生成0-1之间的100个随机数
# 将rng当做索引值
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head(6)
2000-01-01    2.511154
2000-01-02   -1.533321
2000-01-03   -1.945515
2000-01-04   -0.235927
2000-01-05    2.488850
2000-01-06   -0.176643
Freq: D, dtype: float64
ts.resample('M').mean()
2000-01-31    0.003100
2000-02-29   -0.302630
2000-03-31   -0.205999
2000-04-30    0.618526
Freq: M, dtype: float64
ts.resample("M", kind="period").mean()
2000-01    0.003100
2000-02   -0.302630
2000-03   -0.205999
2000-04    0.618526
Freq: M, dtype: float64

OHLC重采样

金融领域常用的时间序列聚合方式,计算面元的四个值:

  • open,开盘
  • close,收盘
  • high,最高
  • low,最低

相关文章

网友评论

    本文标题:《利用Python进行数据分析11》-时间序列(2)

    本文链接:https://www.haomeiwen.com/subject/jjpcjctx.html