一、爬前准备
1、安装pip安装套件
pip install requests
pip install BeautifulSoup4
2、Chrome用户内建开发人员工具
检查—>Network—>Doc
3、编辑视窗
pip install jupyter
# 进行编辑:jupyter notebook
二、使用requests.get取得页面内容:
import requests
res = requests.get("http://finance.ifeng.com/" ) # 调用网页链接
res.encoding = "utf-8" # 汉显
print(res.text)
三、使用BeautifulSoup将页面内容剖析出来('html.parser'—剖析器)
1、范例提取页面相关内容:
data:image/s3,"s3://crabby-images/dce33/dce33700e203e312e26fcc0e1706afd2fd8c77d0" alt=""
data:image/s3,"s3://crabby-images/59e14/59e14d0e947f4d9aa410e4d0918cb9de6c10920e" alt=""
data:image/s3,"s3://crabby-images/2c640/2c640675177ee6e93489a7a35751483c9ac678e5" alt=""
2、取得特定css属性元素:
data:image/s3,"s3://crabby-images/05e16/05e16f291a5868af12e09ce9ee060ac13fe83df1" alt=""
data:image/s3,"s3://crabby-images/2fe4c/2fe4c5566a83de39f41b1c1200ed22d1ab8ab6ad" alt=""
3、在网页的连接上,取得所有a标签内的链接
data:image/s3,"s3://crabby-images/4fff9/4fff9b4b68f04d0b61fe0acd0367bd04739d30f8" alt=""
例:
data:image/s3,"s3://crabby-images/bae43/bae43fa13d0f4b125a2c59343f856658286a6b0e" alt=""
3、确定元素抓取位置:
data:image/s3,"s3://crabby-images/c45eb/c45eb11a80e3c86d2786fc9b34652597116d80db" alt=""
三、根据不同HTML标签取得对应内容
data:image/s3,"s3://crabby-images/bf1ad/bf1ad1e075b76cc3596a2938b6730aaf29f17f9f" alt=""
网友评论