美文网首页
爬取百度贴吧评论,评论作者,评论时间并储存为csv文件

爬取百度贴吧评论,评论作者,评论时间并储存为csv文件

作者: 蜗牛仔 | 来源:发表于2016-11-16 14:29 被阅读125次
    import re
    import csv
    
    with open('source.html', 'r', encoding='UTF-8') as f:
        source = f.read()
        #print(source)
    resultList =[]
    all_data = re.findall('l_post l_post_bright j_l_post clearfix  "(.*?)p_props_tail props_appraise_wrap',source,re.S)
    #print(all_data)
    for item in all_data:
        result = {}
        result['name'] = re.findall('username="(.*?)" class',item,re.S)[0]
        result['time'] = re.findall('class="tail-info">(2016.*?)</span',item,re.S)[0]
        result['content'] = re.findall('j_d_post_content ">(.*?)<',item,re.S)[0]
        resultList.append(result)
        #print(content)
    with open('baidutieba.csv','w',encoding='UTF-8') as f:
        writer = csv.DictWriter(f, fieldnames=['name', 'content', 'time'])
        writer.writeheader()
        writer.writerows(resultList)
    

    相关文章

      网友评论

          本文标题:爬取百度贴吧评论,评论作者,评论时间并储存为csv文件

          本文链接:https://www.haomeiwen.com/subject/qsrrpttx.html