目标网站:https://www.acfun.cn/
爬取所有用户,数量较大,使用scrapy框架
接口比较简单 每个人都有一个UId 循环就好
获取个人信息通过抓取手机app获得
def start_requests(self):
for x in range(1, 100000):
ua = random.choice(user_agent_list)
self.headers = {
'User-Agent': ua,
'deviceType': 0
}
url = 'https://apipc.app.acfun.cn/v2/user/content/profile?app_version=5.10.2&market=appstore&origin=ios&resolution=750x1334&sys_name=ios&sys_version=12.0&userId=%s' % x
yield Request(url,headers=self.headers,callback=self.parse)
整理需要保存的信息:
def parse(self,response):
try:
result = json.loads(response.text)
userid=result['vdata']['userId']
userName=result['vdata']['username']
fenceNum=result['vdata']['followed']
bananaGold=result['vdata']['bananaGold']
userImg=result['vdata']['userImg']
print('正在抓取第%s条数据:%s,%s,%s,%s'%(userid,userid,userName,fenceNum,bananaGold))
setting文件里设置下log级别
LOG_LEVEL = 'ERROR'
运行如图:
image.png完整代码:https://github.com/Liangjianghao/everyDay_spider.git ac_up
网友评论