美文网首页
python爬虫:新浪微博的时间格式处理

python爬虫:新浪微博的时间格式处理

作者: format_b1d8 | 来源:发表于2021-02-25 09:56 被阅读0次

    新浪微博时间的格式比较多,处理起来比较复杂。写了个demo,方便自己日后查看

    from datetime import datetime
    from datetime import timedelta
    publish_time = '02月19日 20:19'
    publish_time = '2020年12月18日 13:36'
    publish_time = '今天06:50'
    publish_time = '10分钟前'
    publish_time = '20秒前'
    publish_time = '2021-02-25  08:32 转赞人数超过200:00'
    publish_time = '今天 08:32 转赞人数超过200'
    if '人数' in publish_time:
        result = publish_time.split(' ')
        result.remove(result[-1])
        publish_time = ' '.join(result)
    if "刚刚" in publish_time:
        publish_time = datetime.now().strftime('%Y-%m-%d %H:%M')
    elif "分钟" in publish_time:
        minute = publish_time[:publish_time.find("分钟")]
        minute = timedelta(minutes=int(minute))
        publish_time = (
            datetime.now() - minute).strftime(
            "%Y-%m-%d %H:%M")
    elif "今天" in publish_time:
        today = datetime.now().strftime("%Y-%m-%d")
        time = publish_time.replace('今天','')
        publish_time = today + " " + time
    elif '年'  in publish_time:
        publish_time = publish_time.replace('年','-').replace('月','-').replace('日','')
    elif "月" in publish_time:
        year = datetime.now().strftime("%Y")
        publish_time = str(publish_time)
        publish_time = year + "-" +publish_time.replace('月','-').replace('日','')
    else:  # 多少秒
        publish_time = datetime.now().strftime('%Y-%m-%d %H:%M')
    publish_time = publish_time+":00"
    print("微博发布时间: " + publish_time)
    

    相关文章

      网友评论

          本文标题:python爬虫:新浪微博的时间格式处理

          本文链接:https://www.haomeiwen.com/subject/txzjfltx.html