美文网首页
格式化Pandas中的日期

格式化Pandas中的日期

作者: Gavin_c980 | 来源:发表于2019-03-01 15:14 被阅读0次

使用map转换每个元素

In [1]: import pandas as pd
   ...: import datetime
   ...: from operator import methodcaller

In [2]: pd.options.display.max_rows = 10

In [3]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=5))

In [4]: s
Out[4]: 
0   2019-03-01 14:44:40.030313
1   2019-03-02 14:44:40.030313
2   2019-03-03 14:44:40.030313
3   2019-03-04 14:44:40.030313
4   2019-03-05 14:44:40.030313
dtype: datetime64[ns]

In [5]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[5]: 
0    01-03-2019
1    02-03-2019
2    03-03-2019
3    04-03-2019
4    05-03-2019
dtype: object

In [6]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[6]: 
0    01-03-2019
1    02-03-2019
2    03-03-2019
3    04-03-2019
4    05-03-2019
dtype: object

对Series中的每个Timestamp元素调用date方法获得Datetime.date的raw对象

In [7]: s.map(methodcaller('date'))
Out[7]: 
0    2019-03-01
1    2019-03-02
2    2019-03-03
3    2019-03-04
4    2019-03-05
dtype: object

In [8]: s.map(methodcaller('date')).values
Out[8]: 
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
       datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
       datetime.date(2019, 3, 5)], dtype=object)

等价方法是调用五绑定的Timestamp.date方法

In [9]: s.map(pd.Timestamp.date)
Out[9]: 
0    2019-03-01
1    2019-03-02
2    2019-03-03
3    2019-03-04
4    2019-03-05
dtype: object

Timestamp.date方法高效且易读。Timestamp方法可以在pandas顶级方法,即pandas.Timestamp。
DatetimeIndex的date属性也可做类似的事。返回一个dtype=object的numpy对象。

In [10]: idx = pd.DatetimeIndex(s)

In [11]: idx
Out[11]: 
DatetimeIndex(['2019-03-01 14:44:40.030313', '2019-03-02 14:44:40.030313',
               '2019-03-03 14:44:40.030313', '2019-03-04 14:44:40.030313',
               '2019-03-05 14:44:40.030313'],
              dtype='datetime64[ns]', freq=None)

In [12]: idx.date
Out[12]: 
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
       datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
       datetime.date(2019, 3, 5)], dtype=object)

对于数据量大的datetime64[ns] Series,Timestamp.date性能好于operator.methodcaller,略微比lambda快。

In [13]: f1 = methodcaller('date')
    ...: f2 = lambda x: x.date()
    ...: f3 = pd.Timestamp.date
    ...: s2 = pd.Series(pd.date_range('20010101', periods=1000000, freq='T'))
    ...: s2
Out[13]: 
0        2001-01-01 00:00:00
1        2001-01-01 00:01:00
2        2001-01-01 00:02:00
3        2001-01-01 00:03:00
4        2001-01-01 00:04:00
                 ...
999995   2002-11-26 10:35:00
999996   2002-11-26 10:36:00
999997   2002-11-26 10:37:00
999998   2002-11-26 10:38:00
999999   2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]

In [14]: timeit s2.map(f1)
2.97 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: timeit s2.map(f2)
2.9 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [16]: timeit s2.map(f3)
2.98 s ± 177 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

pandas的目标之一是在numpy之上提供一个操作层,这样就不必处理ndarray的底层细节。获取原始的datetime.date对象的用途有限,因为没有与之对应的numpy dtype且被pandas支持。Pandas仅支持datetime64[ns]类型,这是纳秒级的。

相关文章

网友评论

      本文标题:格式化Pandas中的日期

      本文链接:https://www.haomeiwen.com/subject/hoyduqtx.html