美文网首页Web scraping with Python-笔记
BeautifulSoup 爬取网络数据(4)-处理同辈节点(s

BeautifulSoup 爬取网络数据(4)-处理同辈节点(s

作者: 查德笔记 | 来源:发表于2018-02-20 16:40 被阅读14次

BeautifulSoup的next_siblings()函数非常适用于表格查找,尤其是带有标题的表格。


image.png
from urllib.request import urlopen
from bs4 import BeautifulSoup


html = urlopen("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, 'lxml')

siblings = soup.find("table",{'id':'giftList'}).tr.next_siblings
sum = 0
for sibling in siblings:
    print(sibling)
    sum+=1
print(sum)

结果为:



<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>


<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>

...

11
0
[Finished in 2.2s]

代码输出产品表中的所有产品,除了首行标题。因为:

  1. 查找对象本身不是自己的同辈,因此使用sibling相关函数时查找对象都会被跳过。
    2.代码使用的是next siblings,因此会返回查找对象的下一个(些)同辈节点。

补充:除了next_siblings,记住previous_siblings经常用来查找已知最后一行容易定位且不需要抓取的情况。当然,next_sibling 和 previous_sibling 可以用来查找一个同辈节点。

相关文章

网友评论

    本文标题:BeautifulSoup 爬取网络数据(4)-处理同辈节点(s

    本文链接:https://www.haomeiwen.com/subject/alygtftx.html