爬虫基础系列BeautifulSoup——css选择器(4)

作者: 猛犸象和剑齿虎 | 来源:发表于2019-05-21 06:05 被阅读0次

爬虫基础系列BeautifulSoup——css选择器(4)
BeautifulSoup4解析器(css选择器)
python-爬虫
Python爬虫-Scrapy框架之Scrapy Shell
scrapy前了解Xpath
总结：requests、beautifulsoup基础语法【崔庆
详解BeautifulSoup4
2018-11-22
爬虫BeautifulSoup 的 CSS 选择器
[CP_11] Python数据清洗之BeautifulSoup

8586231_192932724000_2.jpg

css选择器简介

CSS选择器类型：标签选择器、类选择器、id选择器

方法：select()

from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title><title>The Dormouse's story2</title></head>
<body>
<p class="title" name="dromouse"><b  class="sister" >The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

①

#通过标签名查找
soup=BeautifulSoup(html,'lxml')
data1=soup.select('a')
print(type(data1))

结果：注意是一个列表，而不是bs4结果集。

<class 'list'>

②

#通过类名查找
data2=soup.select('.sister')
print(data2)

结果：将整个class="sister"的标签拿出来，放入列表中。

[<b class="sister">The Dormouse's story</b>, <a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

③

#通过id查找
data3=soup.select('#link2')
print(data3)

结果：

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

④

#组合查找
data4=soup.select('p #link1')
print(data4)

结果：

[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

⑤

#通过其他属性查找
data5=soup.select('a[href="http://example.com/tillie"]')
print(data5)

结果：

[<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

BeautifulSoup模块总结：

相比于xpath,beautifulsoup的方法十分的复杂和琐碎，除了基础性的知识点，还有分类的搜索文档树find_all，以及css选择器select，所以记忆起来还是比较困难的。
我们在实际过程中不必刻意记忆，毕竟不用考试，而是有个大致的印象，当用到时，记得从哪里搜素就行。

网友评论

本文标题：爬虫基础系列BeautifulSoup——css选择器(4)

本文链接：https://www.haomeiwen.com/subject/wtisaqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬虫基础系列BeautifulSoup——css选择器(4)

css选择器简介

方法：select()

BeautifulSoup模块总结：

相关文章

爬虫基础系列BeautifulSoup——css选择器(4)

BeautifulSoup4解析器(css选择器)

python-爬虫

Python爬虫-Scrapy框架之Scrapy Shell

scrapy前了解Xpath

总结：requests、beautifulsoup基础语法【崔庆

详解BeautifulSoup4

2018-11-22

爬虫BeautifulSoup 的 CSS 选择器

[CP_11] Python数据清洗之BeautifulSoup

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读