2019-06-23--爬取《工作细胞》评论数据并打印出来--版

作者: heiqimingren | 来源:发表于2019-06-23 10:46 被阅读0次

2019-06-23--爬取《工作细胞》评论数据并打印出来--版
用python爬取天猫商品评论并分析（2）
爬取淘宝、京东评论数据并制作评论词云
入门级爬虫-爬取京东商品评价
京东的验证码和反爬都很烦人吧？那又怎样，照样轻松爬取京东数据
Python爬取京东Iphone X用户评论并绘制词云
Python爬虫-爬取天气信息
利用webmagic获取天猫评论
Python《海上牧云记》腾讯视频评论爬虫&情感分析
selenium3.0.1使用chrome，设置代理ip进行爬取


'''
1,成功了。爬取的评论等数据，都可以用print打印出来，非常爽！
2，解析来就是差数据分析了，或者保存好数据。
3，一次采用了418秒，速度真快！

'''


url = 'https://bangumi.bilibili.com/review/web_api/short/list?media_id=102392&folded=0&page_size=20&sort=0&cursor=77584296013002'


import requests
import json
import time
import datetime
import pprint
import time


url2 = 'https://bangumi.bilibili.com/review/web_api/short/list?media_id=102392&folded=0&page_size=20&sort=0'
headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
start = time.time()
# 发送get请求
response_comment = requests.get(url2,headers = headers)
json_comment = response_comment.text  #目前json_conment 格式是str，字符串类型。
json_comment = json.loads(json_comment)  # 讲字符串类型转换成了字典类型

total = json_comment['result']
# pprint.pprint(total)
lists = total['list']  #得到的是一个列表。[{},{},{}]，列表当中包含了一定数量的评论。
total2 = total['total'] #获取到的是评论的总数量 19222
# pprint.pprint(lists)

j=0
while j < int(total2):
    n = len(lists)
    for i in range(n):
        username = lists[i]['author']['uname'] #得到评论的用户名
        content = lists[i]['content'] #得到评论内容
        timeStamp = lists[i]['mtime']  #得到的是时间戳
        timeArray = time.localtime(timeStamp)
        otherStyleTime = time.strftime("%Y--%m--%d %H:%M:%S", timeArray) #得到的是时间
        likes = lists[i]['likes']  #得到的是点赞数量。
        score = lists[i]['user_rating']['score'] #得到的是用户评分。
        print(j,username,content,otherStyleTime)
        j = j+1


    comment_api = url2 + '&cursor=' + lists[-1]['cursor']  #这是下一个api的cursor值！
    response_comment = requests.get(comment_api, headers=headers)
    json_comment = response_comment.text  # 目前json_conment 格式是str，字符串类型。
    json_comment = json.loads(json_comment)  # 讲字符串类型转换成了字典类型
    total = json_comment['result']
    lists = total['list']  # 得到的是一个列表。[{},{},{}]，列表当中包含了一定数量的评论。

end = time.time()
print("完成时间: %f s" % (end - start))#
# pprint.pprint(comment_api)

image.png