（三）python爬虫实例 - 微博评论爬取

作者: fly蜘蛛侠 | 来源:发表于2020-02-15 19:14 被阅读0次

1、环境

pycharm，python3.5以上，requests，BeautifulSoup4，chrome

2、代码

import requests
from bs4 import BeautifulSoup
from urllib import parse

RQS_ID = ''  # ***去手动复制第一个评论的id
ROOT_COMMENT_MAX_ID = ''
ROOT_COMMENT_MAX_ID_TYPE = ''


def get_con_page(nbr):
    global RQS_ID, ROOT_COMMENT_MAX_ID, ROOT_COMMENT_MAX_ID_TYPE
    headers = {
        "Cookie": ""  # ***去手动复制cookie
    }
    if nbr == 1:
        res = requests.get(
            f'https://weibo.com/aj/v6/comment/big?ajwvr=6&id={RQS_ID}&from=singleWeiBo',
            headers=headers
        )
    else:
        res = requests.get(
            f'https://weibo.com/aj/v6/comment/big'
            f'?ajwvr=6&id={RQS_ID}'
            f'&root_comment_max_id={ROOT_COMMENT_MAX_ID}'
            f'&root_comment_max_id_type={ROOT_COMMENT_MAX_ID_TYPE}'
            f'&root_comment_ext_param='
            f'&page=%s&filter=hot'
            f'&filter_tips_before=0&from=singleWeiBo' % nbr,
            headers=headers
        )

    html = res.json()['data']['html']
    soup = BeautifulSoup(html, 'html.parser')
    m_con_list = soup.find_all('div', attrs={'node-type': 'replywrap'})
    for m_con in m_con_list:
        con_text = m_con.find('div', class_='WB_text').text.strip()
        print(con_text)

    action_data = soup.find('div', attrs={'node-type': 'comment_loading'})
    if action_data is not None:
        action_data = action_data['action-data']
    else:
        action_data = soup.find('a', attrs={'action-type': 'click_more_comment'})['action-data']

    print(action_data)
    parse_qs = parse.parse_qs(action_data)
    RQS_ID = parse_qs['id'][0]
    ROOT_COMMENT_MAX_ID = parse_qs['root_comment_max_id'][0]
    ROOT_COMMENT_MAX_ID_TYPE = parse_qs['root_comment_max_id_type'][0]


for _nbr in range(1, 150):
    print('第%d页' % _nbr)
    get_con_page(_nbr)
    print('-' * 120)

运行结果：

欢迎大家加入qq群一起交流爬虫技术：python爬虫技术交流群（494976303）

网友评论

python爬虫实例

本文标题：（三）python爬虫实例 - 微博评论爬取

本文链接：https://www.haomeiwen.com/subject/mamyfhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

（三）python爬虫实例 - 微博评论爬取

1、环境

2、代码

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

python爬虫实例