转载:https://zhuanlan.zhihu.com/p/354032061
如有侵权,请及时联系我立即删除!
地址:
http://match.yuanrenxue.com/match/12
目录:
1、环境
2、分析请求
3、实现爬取
1、环境
Python3.7、pyexecjs、requests
2、分析请求
![](https://img.haomeiwen.com/i6591571/c5aa630b44f3901b.png)
![](https://img.haomeiwen.com/i6591571/c5ed83dcd4a17bd9.png)
直接查看调用栈进js看看如图2-3
![](https://img.haomeiwen.com/i6591571/6c745dc3992dfd97.png)
一进来就看到了m的赋值
var list = {
"page": window.page,
"m": btoa('yuanrenxue' + window.page)
};
btoa就是JavaScript的base64
分析到这里就结束了,是不是很简单
3、实现爬取
import base64
import requests
headers = {
'Host': 'match.yuanrenxue.com',
'Referer': 'http://match.yuanrenxue.com/match/6',
'User-Agent': 'yuanrenxue.project',
}
def encode_base64(content):
return base64.b64encode(content.encode('utf-8')).decode('utf-8')
def main():
sum_list = []
url = 'http://match.yuanrenxue.com/api/match/12'
for page in range(1, 6):
data = {
'page': page,
'm': encode_base64('yuanrenxue' + str(page))
}
value = requests.get(url=url, headers=headers, params=data).json()
print(value['data'])
for i in value['data']:
sum_list.append(i['value'])
print(sum(sum_list))
if __name__ == '__main__':
main()
运行结果:
![](https://img.haomeiwen.com/i6591571/356dcb8dad65a772.png)
网友评论