BeautifulSoup 爬取网络数据(4)-处理同辈节点(s

作者: 查德笔记 | 来源:发表于2018-02-20 16:40 被阅读14次

BeautifulSoup 爬取网络数据(4)-处理同辈节点(s
BeautifulSoup 爬取网络数据(3)-处理子节点(ch
BeautifulSoup4爬取某社招网站数据
BeautifulSoup 爬取网络数据（1）
QQ空间爬虫，打造历时说说词云图，python来唤醒你的记忆！
数据解读独角兽企业“猿辅导”（第一部分）
Python抓取新浪新闻数据
用beautifulsoup爬取微信公号的二手房信息
豆瓣电影TOP250数据分析
python 网络爬虫 - BeautifulSoup 爬取网络

BeautifulSoup的next_siblings()函数非常适用于表格查找，尤其是带有标题的表格。

image.png

from urllib.request import urlopen
from bs4 import BeautifulSoup


html = urlopen("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, 'lxml')

siblings = soup.find("table",{'id':'giftList'}).tr.next_siblings
sum = 0
for sibling in siblings:
    print(sibling)
    sum+=1
print(sum)

结果为：



<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>


<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>

...

11
0
[Finished in 2.2s]

代码输出产品表中的所有产品，除了首行标题。因为：