美文网首页
B站弹幕爬虫

B站弹幕爬虫

作者: 苍简 | 来源:发表于2019-01-27 17:09 被阅读6次

    API接口:
    http://comment.bilibili.com/72036817.xml
    https://api.bilibili.com/x/v1/dm/list.so?oid=9931722
    数字是av号

    但不是全部弹幕,只有一千条

    from bs4 import BeautifulSoup
    import pandas as pd
    import requests
    
    url = 'http://comment.bilibili.com/72036817.xml'
    html = requests.get(url).content
    html_data = str(html, 'utf-8')
    soup = BeautifulSoup(html_data, 'lxml')
    results = soup.find_all('d')
    
    comments = [comment.text for comment in results]
    comments_dict = {'comments': comments}
    
    df = pd.DataFrame(comments_dict)
    df.to_csv('bilibili.csv', encoding='utf-8')
    

    相关文章

      网友评论

          本文标题:B站弹幕爬虫

          本文链接:https://www.haomeiwen.com/subject/smqnjqtx.html