BeautifulSoup

作者: 小T数据站 | 来源:发表于2019-01-10 10:58 被阅读1次

爬虫任务二
BeautifulSoup(BS4)的基本使用
BeautifulSoup基础使用
beautifulsoup教程
Python中HTML解析
beautifulsoup4 标签选择器
用beautifulsoup剖析网页元素
Python 抓取花瓣图片地址
HTML 解析
Python 爬虫基础｜Python网络数据采集笔记

标签选择器/节点选择器

from bs4 import BeautifulSoup
soup= BeautifulSoup(html,'lxml')
soup.prettify() # 补全html

# 获取标签及内容
print(soup.title)
print(soup.head)
print(soup.p)

# 获取属性
print(soup.p.attrs['name'])
print(soup.p['name'])

# 获取内容
print(soup.title.string)

# 嵌套选择
print(soup.head.title.string) # 获取head标签下的title标签里的内容

# 获取子节点
法一：print(soup.p.contents)
法二：print(soup.p.children)
           for i,child in enumerate(soup.p.children):
               print(i,child)

# 获取子孙节点
print(soup.p.descendants)
for i,child in enumerate(soup.p.descendants):
    print(i,child)

# 获取父节点
print(soup.p.parent)

# 获取祖先节点
print(list(enumerate(soup.a.parents)))

# 获取兄弟节点
print(list(enumerate(soup.a.next_siblings)))
print(list(enumerate(soup.a.previous_siblings)))

标准选择器/方法选择器
find_all(name,attrs,recursive.text,**kwargs)
find(name,attrs,recursive.text,**kwargs)
CSS选择器

使用CSS选择器时，只需要调用select()方法，传入相应的CSS选择器即可:
print(soup.select('#id值 .class值 节点值'))

# 嵌套选择
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
    print(ul.select('li'))

# 获取属性
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
    print(ul['id'])
    print(ul.attrs['id'])

# 获取文本
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for li in soup.select('li'):
    print('Get Text:', li.get_text())
    print('String:', li.string)

总结
- 推荐使用lxml解析库，必要时使用html.parser。
- 节点选择筛选功能弱但是速度快。
- 建议使用find()或者find_all()查询匹配单个结果或者多个结果。
- 如果对CSS选择器熟悉的话，可以使用select()方法选择。

以上是根据崔庆才的爬虫视频做的个人笔记，可参考崔庆才的个人博客关于BeautifulSoup的详解 [Python3网络爬虫开发实战] 4.2-使用Beautiful Soup

网友评论

数据蛙数据分析每周作业

本文标题：BeautifulSoup

本文链接：https://www.haomeiwen.com/subject/zseorqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BeautifulSoup

相关文章

爬虫任务二

BeautifulSoup(BS4)的基本使用

BeautifulSoup基础使用

beautifulsoup教程

Python中HTML解析

beautifulsoup4 标签选择器

用beautifulsoup剖析网页元素

Python 抓取花瓣图片地址

HTML 解析

Python 爬虫基础｜Python网络数据采集笔记

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

数据蛙数据分析每周作业