美文网首页
B站弹幕爬虫

B站弹幕爬虫

作者: 苍简 | 来源:发表于2019-01-27 17:09 被阅读6次

API接口:
http://comment.bilibili.com/72036817.xml
https://api.bilibili.com/x/v1/dm/list.so?oid=9931722
数字是av号

但不是全部弹幕,只有一千条

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://comment.bilibili.com/72036817.xml'
html = requests.get(url).content
html_data = str(html, 'utf-8')
soup = BeautifulSoup(html_data, 'lxml')
results = soup.find_all('d')

comments = [comment.text for comment in results]
comments_dict = {'comments': comments}

df = pd.DataFrame(comments_dict)
df.to_csv('bilibili.csv', encoding='utf-8')

相关文章

网友评论

      本文标题:B站弹幕爬虫

      本文链接:https://www.haomeiwen.com/subject/smqnjqtx.html