一、爬前准备
1、安装pip安装套件
pip install requests
pip install BeautifulSoup4
2、Chrome用户内建开发人员工具
检查—>Network—>Doc
3、编辑视窗
pip install jupyter
# 进行编辑:jupyter notebook
二、使用requests.get取得页面内容:
import requests
res = requests.get("http://finance.ifeng.com/" ) # 调用网页链接
res.encoding = "utf-8" # 汉显
print(res.text)
三、使用BeautifulSoup将页面内容剖析出来('html.parser'—剖析器)
1、范例提取页面相关内容:
![](https://img.haomeiwen.com/i6201367/adf3c0f0c04e9f00.png)
![](https://img.haomeiwen.com/i6201367/b7777f01cb5b6712.png)
![](https://img.haomeiwen.com/i6201367/84e663c02be69292.png)
2、取得特定css属性元素:
![](https://img.haomeiwen.com/i6201367/656bdf40fdf65e64.png)
![](https://img.haomeiwen.com/i6201367/9ab8108b26a76cad.png)
3、在网页的连接上,取得所有a标签内的链接
![](https://img.haomeiwen.com/i6201367/e9b255d640493941.png)
例:
![](https://img.haomeiwen.com/i6201367/102f1245512453df.png)
3、确定元素抓取位置:
![](https://img.haomeiwen.com/i6201367/b12d09883a43cd67.png)
三、根据不同HTML标签取得对应内容
![](https://img.haomeiwen.com/i6201367/e875baf9b7e7cffb.png)
网友评论