BeautifulSoup 爬取网络数据(3)-处理子节点(ch

作者: 查德笔记 | 来源:发表于2018-02-20 16:13 被阅读70次

3. 1子节点和子孙节点

soup.body.h1# 选中body 标签下的h1，这个h1 标签是body标签的子节点

同理，soup.div.find_all('img')会找到所有div里面的img标签。
.children 和.descendants
对比代码如下：

html = urlopen('http://www.pythonscraping.com/pages/page3.html')
soup = BeautifulSoup(html, 'lxml')
children = soup.find('table',{'id':'giftList'}).children
descendants = soup.find('table',{'id':'giftList'}).descendants
sum = 0
for child in children:
    print(child)
    sum +=1
print(sum)
sum2 = 0
for descendant in descendants:
    sum2+=1
    print(descendant)
print(sum2)

运行结果可知 sum = 13, sum2 = 86
取descendants的第一部分作比较可以发现

<tr><th>#=============<tr>是soup.find('table',{'id':'giftList'})的子节点====
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>#============<tr>是soup.find('table',{'id':'giftList'})的子节点====
<th>        #============<th>是<tr>的子节点，('table',{'id':'giftList'})的子孙节点==
Item Title
</th>       #============<th>是<tr>的子节点，('table',{'id':'giftList'})的子孙节点==

Item Title#=========文本是<th>标签的内容，也是子孙节点================

<th>#============同上====================
Description
</th>

Description

<th>
Cost
</th>

Cost
....

对比可知，children只列出了<tr>标签所包含的内容。而descendants列出了所有包含的标签节点以及文本，即<tr>子标签中的所有子子孙孙标签都会查找返回。

3.2 父节点

通常情况下我们更经常查找子节点，而在某些特定情况下会用到查询父节点，.parents 和 .parent。

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')
soup = BeautifulSoup(html)
print(soup.find('img', {'src':'../img/gifts/img1.jpg'}).parent.previous_sibling.get_text())

分析一下代码是如何工作的。

<tr>
--<td>
--<td>(3)
    --"$15.00"(4)
--s<td>(2)
    --<img src="../img/gifts/img1.jpg">(1)

1.首先定位到含src="../img/gifts/img1.jpg"的标签img。
2.选中img标签的父节点s<td>.
3.选中s<td>的上一个同层级标签<td>
4.选取<td>标签中的文字

网友评论

本文标题：BeautifulSoup 爬取网络数据(3)-处理子节点(ch

本文链接：https://www.haomeiwen.com/subject/jysgtftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BeautifulSoup 爬取网络数据(3)-处理子节点(ch

3. 1子节点和子孙节点

3.2 父节点

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

python爬虫

Web scraping with Python-笔记