BeautifulSoup库-查找方法

作者: 小橙子_43db | 来源:发表于2019-12-24 20:28 被阅读0次

BeautifulSoup库-查找方法
2020-05-27 学习python爬虫系列（四）：Beaut
三大网页抓取方法
BeautifulSoup库相关操作
BeautifulSoup的使用姿势
6. BeautifulSoup 解析库
2018-03-13
实战计划1-2爬取商品信息
Python基础学习19
2019-05-06

主要内容：BeautifulSoup的常用查找方法及参数介绍。

<>.find_all(name,attrs,recursive,string,**kwargs)：返回列表类型，存储查找结果，可用于BeautifulSoup对象和标签。

name：对标签名称的检索字符串

attrs：对标签属性进行检索可标注属性检索

recursive：是否对子孙全部索引，默认为True

string：对标签中的字符全区域进行检索

测试字符串：

#查找方法

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.

...

"""

soup = BeautifulSoup(html_doc,'html.parser')

#通过标签名name参数查找

soup.find_all('a')

输出所有<a>标签的列表：[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,

<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,

<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

#获取所有a标签的超链接

for i in soup.find_all('a'):

print(i.attrs.get('href'))

输出：

http://example.com/elsie

http://example.com/lacie

http://example.com/tillie

#通过属性查找

import re

print('----',soup.find_all('p','title')) #属性为title的p标签

print('----',soup.find_all(id='link1'))

print('----',soup.find_all(id=re.compile('link'))) #正则匹配id含有link字符串的标签

输出：

---- [The Dormouse's story]

---- [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

---- [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

#是否对于子孙全部索引

print('----',soup.find_all('a','sister'))

print('----',soup.find_all('a','sister',recursive=False))

输出：---- [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

---- []

#标签中的字符串查找

print('----',soup.find_all(string='''The Dormouse's story'''))

print('----',soup.find_all(string=re.compile('The\w*')))

输出：---- ["The Dormouse's story", "The Dormouse's story"]

---- ["The Dormouse's story", "The Dormouse's story"]

补充方法：

<>.find()：搜索且只返回第一个找到的结果，参数同find_all()

<>.find_parent()：在先辈节点中搜索，返回一个结果，参数同find_all()

<>.find_parents()：在先辈节点中搜索，返回列表，参数同find_all()

<>.find_next_sibiling()：在后序平行节点中搜索，返回一个结果，参数同find_all()

<>.find_next_sibilings()：在后序平行节点中搜索，返回列表，参数同find_all()

<>.find_previous_sibiling()：在前序平行节点中搜索，返回一个结果，参数同find_all()

<>.find_previous_sibilings()：在前序平行节点中搜索，返回一个列表，参数同find_all()

网友评论

本文标题：BeautifulSoup库-查找方法

本文链接：https://www.haomeiwen.com/subject/tirfoctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BeautifulSoup库-查找方法

相关文章