美文网首页
常用的数据分析代码片段

常用的数据分析代码片段

作者: Leesper | 来源:发表于2018-09-18 12:02 被阅读26次

    本文会不定期更新一些在做数据分析时经常用到的技巧,以代码片段的形式提供,主要针对numpy,pandas和matplotlib等常用Python数据分析库。

    1. 将原始数据中的日期时间字段转换为pandas.Timestamp类型

    def convert_to_datetime(s):
      return pd.to_datetime(s.apply(lambda x: datetime.strptime(x, '%m/%d/%y %H:%M:%S')))
    

    常见的日期格式说明符:

    • %Y Four-digit year
    • %y Two-digit year
    • %m Two-digit month [01, 12]
    • %d Two-digit day [01, 31]
    • %H Hour (24-hour clock) [00, 23]
    • %I Hour (12-hour clock) [01, 12]
    • %M Two-digit minute [00, 59]
    • %S Second [00, 61] (seconds 60, 61 account for leap seconds)
    • %w Weekday as integer [0 (Sunday), 6]
    • %U Week number of the year [00, 53]; Sunday is considered the frst day of the week, and days before the frst Sunday of the year are “week 0”
    • %W Week number of the year [00, 53]; Monday is considered the frst day of the week, and days before the frst Monday of the year are “week 0”
    • %z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
    • %F Shortcut for %Y-%m-%d (e.g., 2012-4-18)
    • %D Shortcut for %m/%d/%y (e.g., 04/18/12)

    2. 将类别数据类型转换为category

    raw_data['card_type'] = raw_data['card_type'].astype('category')
    

    相关文章

      网友评论

          本文标题:常用的数据分析代码片段

          本文链接:https://www.haomeiwen.com/subject/jltunftx.html