《利用Python进行数据分析11》-时间序列（2）

作者: 皮皮大 | 来源:发表于2019-08-09 11:11 被阅读0次

Pandas时间序列切片（范围选取）前先将日期按升序排序！
《利用Python进行数据分析11》-时间序列（2）
2019-10-12
第3章 Python数据结构、函数
《利用Python进行数据分析》——时间序列
《利用Python进行数据分析》PDF高清完整版-免费下载
数据分析学习计划
利用Python进行数据分析－数据结构准备（元组、列表、字典、集
2016图书清单（好书会写简单介绍）-0054
IPython基础

时区处理

使用的第三方库为pytz

获取时区对象：pytz.timezone
本地化：tz_localize
时区转换：tz_convert
tz_localize和tz_convert也是DatetimeIndex的实例⽅法

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# 导入包，查看最后的5个时区
import pytz
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

tz = pytz.timezone("America/New_York")
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

# pandas中的时间序列是单纯的（naive）时区
rng = pd.date_range("3/9/2012 9:20", periods=6, freq="D")
rng

DatetimeIndex(['2012-03-09 09:20:00', '2012-03-10 09:20:00',
               '2012-03-11 09:20:00', '2012-03-12 09:20:00',
               '2012-03-13 09:20:00', '2012-03-14 09:20:00'],
              dtype='datetime64[ns]', freq='D')

ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-09 09:20:00    1.108616
2012-03-10 09:20:00   -0.476701
2012-03-11 09:20:00    0.224237
2012-03-12 09:20:00    0.131083
2012-03-13 09:20:00    1.906484
2012-03-14 09:20:00   -0.632415
Freq: D, dtype: float64

# 利用时区生成日期范围
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

ts_utc = ts.tz_localize("UTC")
ts_utc

2012-03-09 09:20:00+00:00    1.108616
2012-03-10 09:20:00+00:00   -0.476701
2012-03-11 09:20:00+00:00    0.224237
2012-03-12 09:20:00+00:00    0.131083
2012-03-13 09:20:00+00:00    1.906484
2012-03-14 09:20:00+00:00   -0.632415
Freq: D, dtype: float64

ts_utc.index

DatetimeIndex(['2012-03-09 09:20:00+00:00', '2012-03-10 09:20:00+00:00',
               '2012-03-11 09:20:00+00:00', '2012-03-12 09:20:00+00:00',
               '2012-03-13 09:20:00+00:00', '2012-03-14 09:20:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

# 从本地时区转换到其他时区：tz_convert
ts_utc.tz_convert("America/New_York")

2012-03-09 04:20:00-05:00    1.108616
2012-03-10 04:20:00-05:00   -0.476701
2012-03-11 05:20:00-04:00    0.224237
2012-03-12 05:20:00-04:00    0.131083
2012-03-13 05:20:00-04:00    1.906484
2012-03-14 05:20:00-04:00   -0.632415
Freq: D, dtype: float64

# tz_localize和tz_convert也是DatetimeIndex的实例⽅法：
ts.index.tz_localize('Asia/Shanghai')

DatetimeIndex(['2012-03-09 09:20:00+08:00', '2012-03-10 09:20:00+08:00',
               '2012-03-11 09:20:00+08:00', '2012-03-12 09:20:00+08:00',
               '2012-03-13 09:20:00+08:00', '2012-03-14 09:20:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq='D')

不同时区间的运算

时间序列不同的时区，合并在一起，最终是UTC
时间戳是以UTC形式存储的

rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-07 09:30:00   -1.109190
2012-03-08 09:30:00   -0.207785
2012-03-09 09:30:00    0.624029
2012-03-12 09:30:00    0.433870
2012-03-13 09:30:00   -0.485877
2012-03-14 09:30:00   -0.936569
2012-03-15 09:30:00   -0.866577
2012-03-16 09:30:00    1.819173
2012-03-19 09:30:00    0.204572
2012-03-20 09:30:00   -0.565101
Freq: B, dtype: float64

ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')

result = ts1 + ts2
result

2012-03-07 09:30:00+00:00         NaN
2012-03-08 09:30:00+00:00         NaN
2012-03-09 09:30:00+00:00    1.248058
2012-03-12 09:30:00+00:00    0.867740
2012-03-13 09:30:00+00:00   -0.971755
2012-03-14 09:30:00+00:00   -1.873137
2012-03-15 09:30:00+00:00   -1.733153
Freq: B, dtype: float64

result.index

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

时期及运算

时期表示的是时间区间等
Period类表示的这种数据结构

p = pd.Period(2016, freq="A-DEC")

Period('2016', 'A-DEC')

p + 3

Period('2019', 'A-DEC')

p - 2

Period('2014', 'A-DEC')

# 两个Period拥有相同的对象，它们的差是单位数量
pd.Period("2020", freq="A-DEC") - p

<4 * YearEnds: month=12>

# 创建规则的时期范围
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')

rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

Period和PeriodIndex对象

通过asfreq转成别的频率
需要使用pd.Period()

p = pd.Period("2018", freq="A-DEC")
p

Period('2018', 'A-DEC')

p.asfreq("M", how="start")

Period('2018-01', 'M')

p.asfreq("M", how="end")

Period('2018-12', 'M')

p = pd.Period('2007', freq='A-JUN')
p

Period('2007', 'A-JUN')

p.asfreq('M', 'start')

Period('2006-07', 'M')

p.asfreq('M', 'end')

Period('2007-06', 'M')

将Timestamp转换为Period

通过to_period：将时间戳索引的S和DF对象转化为时期索引
通过to_timestamp：转回时间戳的格式

rng = pd.date_range("2018-01-01", periods=3)

rng

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq='D')

ts = pd.Series(np.random.randn(3), index=rng)

ts

2018-01-01   -0.146067
2018-01-02    0.815443
2018-01-03    0.416382
Freq: D, dtype: float64

pts = ts.to_period()
pts

2018-01-01   -0.146067
2018-01-02    0.815443
2018-01-03    0.416382
Freq: D, dtype: float64

rng = pd.date_range('1/29/2000', periods=6, freq='D')

ts2 = pd.Series(np.random.randn(6), index=rng)
ts2

2000-01-29    0.267670
2000-01-30    0.844309
2000-01-31   -2.875965
2000-02-01    0.005687
2000-02-02   -0.450650
2000-02-03    0.650101
Freq: D, dtype: float64

ts2.to_period("M")

2000-01    0.267670
2000-01    0.844309
2000-01   -2.875965
2000-02    0.005687
2000-02   -0.450650
2000-02    0.650101
Freq: M, dtype: float64

pts = ts2.to_period()
pts

2000-01-29    0.267670
2000-01-30    0.844309
2000-01-31   -2.875965
2000-02-01    0.005687
2000-02-02   -0.450650
2000-02-03    0.650101
Freq: D, dtype: float64

pts.to_timestamp(how="end")

2000-01-29 23:59:59.999999999    0.267670
2000-01-30 23:59:59.999999999    0.844309
2000-01-31 23:59:59.999999999   -2.875965
2000-02-01 23:59:59.999999999    0.005687
2000-02-02 23:59:59.999999999   -0.450650
2000-02-03 23:59:59.999999999    0.650101
Freq: D, dtype: float64

重采样及频率转换

重采样：将时间序列从一个频率转到另一个频率的处理过程。

降采样：将高频率数据聚合到低频率的过程

升采用：从低频率转换到高频率的过程

不是绝对的划分：W-WED--->W-FRI

函数使用的是resample方法

# 从起始时间开始，建立100个，以天D为频率
rng = pd.date_range('2000-01-01', periods=100, freq='D')

# len(rng)就是100，相当于是生成0-1之间的100个随机数
# 将rng当做索引值
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head(6)

2000-01-01    2.511154
2000-01-02   -1.533321
2000-01-03   -1.945515
2000-01-04   -0.235927
2000-01-05    2.488850
2000-01-06   -0.176643
Freq: D, dtype: float64

ts.resample('M').mean()

2000-01-31    0.003100
2000-02-29   -0.302630
2000-03-31   -0.205999
2000-04-30    0.618526
Freq: M, dtype: float64

ts.resample("M", kind="period").mean()

2000-01    0.003100
2000-02   -0.302630
2000-03   -0.205999
2000-04    0.618526
Freq: M, dtype: float64

OHLC重采样

金融领域常用的时间序列聚合方式，计算面元的四个值：

open，开盘
close，收盘
high，最高
low，最低

网友评论

呆鸟的Python数据分析

本文标题：《利用Python进行数据分析11》-时间序列（2）

本文链接：https://www.haomeiwen.com/subject/jjpcjctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

《利用Python进行数据分析11》-时间序列（2）

时区处理

不同时区间的运算

时期及运算

Period和PeriodIndex对象

将Timestamp转换为Period

重采样及频率转换

OHLC重采样

相关文章

Pandas时间序列切片（范围选取）前先将日期按升序排序！

《利用Python进行数据分析11》-时间序列（2）

2019-10-12

第3章 Python数据结构、函数

《利用Python进行数据分析》——时间序列

《利用Python进行数据分析》PDF高清完整版-免费下载

数据分析学习计划

利用Python进行数据分析－数据结构准备（元组、列表、字典、集

2016图书清单（好书会写简单介绍）-0054

IPython基础

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

呆鸟的Python数据分析