爬虫2 BeautifulSoup

作者: 若晴y | 来源:发表于2021-04-26 18:18 被阅读0次

Python爬虫入门（urllib+Beautifulsoup）
爬虫2 BeautifulSoup
beautifulsoup教程
Python简单爬虫
python爬虫2：BeautifulSoup 初识爬虫
Python+PhantomJS+selenium+Beauti
BeautifulSoup requests 爬虫初体验
Python 爬虫
爬虫2
爬虫

image.png
第0关的requests库帮我们搞定了爬虫第0步——获取数据；第1关的HTML知识，是进行爬虫必不可少的背景知识，能辅助我们解析和提取数据
接下来，解析和提取的部分就交给灵活又方便的网页解析库BeautifulSoup。
那么，本关学习目标：学会使用BeautifulSoup解析和提取网页中的数据。
解析数据】是什么意思呢？

icon

我们平时使用浏览器上网，浏览器会把服务器返回来的HTML源代码翻译为我们能看懂的样子，之后我们才能在网页上做各种操作。
而在爬虫中，也要使用能读懂html的工具，才能提取到想要的数据。

image.png

这就是解析数据。

icon

【提取数据】是指把我们需要的数据从众多数据中挑选出来。
老师还想提醒一下：解析与提取数据在爬虫中，既是一个重点，也是难点。因为这一关要讲两步，信息量会比之前两关大，所以希望你在学习的时候，能做好一定的心理准备，投入更多精力。
import requests
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
html = res.text
soup = BeautifulSoup( html,'html.parser')
items = soup.find_all(class_='books')
for item in items:
kind = item.find('h2')
title = item.find(class_='title')
brief = item.find(class_='info')
print(kind.text,'\n',title.text,'\n',title['href'],'\n',brief.text)

网友评论

本文标题：爬虫2 BeautifulSoup

本文链接：https://www.haomeiwen.com/subject/aehlrltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬虫2 BeautifulSoup

相关文章

Python爬虫入门（urllib+Beautifulsoup）