title: BeautifulSoup使用二
date: 2019-03-04 17:48:15
tags:
[TOC]
关联选择
子节点
示例代码如下
import requests
from bs4 import BeautifulSoup
data="""<div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}"
href="/board/7"
>热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}"
href="/board/6"
>最受期待榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:1}"
data-state-val="{subnavId:1}"
class="active" href="javascript:void(0);"
>国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}"
href="/board/2"
>北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}"
href="/board/4"
>TOP100榜</a>
</li>
</ul>
</div>
"""
data=BeautifulSoup(data,'html.parser')
print(type(data.ul.contents))
print(data.ul.contents)
结果如下
<class 'list'>
['\n', <li>
<a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
</li>, '\n', <li>
<a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
</li>, '\n', <li>
<a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
</li>, '\n', <li>
<a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
</li>, '\n', <li>
<a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
</li>, '\n']
>>>
解释如下
type(data.ul.contents)
<class 'list'>
可以看到data.ul.contents返回的是一个列表,每个元素都是这个标签的直接子标签,但是返回元素的标签是bs4.element.Tag类型的
print(type(data.ul.contents[1]))
<class 'bs4.element.Tag'>
父节点和祖先节点
示例如下
import requests
from bs4 import BeautifulSoup
data="""<div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}"
href="/board/7"
>热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}"
href="/board/6"
>最受期待榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:1}"
data-state-val="{subnavId:1}"
class="active" href="javascript:void(0);"
>国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}"
href="/board/2"
>北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}"
href="/board/4"
>TOP100榜</a>
</li>
</ul>
</div>
<div>hfdfd</div>
"""
data=BeautifulSoup(data,'html.parser')
print(type(data.ul.parent))
print(data.ul.parent)
结果如下
=============== RESTART: C:\Users\Administrator\Desktop\aaa.py ===============
<class 'bs4.element.Tag'>
<div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
</li>
<li>
<a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
</li>
</ul>
</div>
>>>
解释如下
可以看到data.ul.parent返回的是bs4.element.Tag类型的对象,并且返回了他的直接父节点的bs4.element.Tag对象
示例如下
import requests
from bs4 import BeautifulSoup
data="""<div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}"
href="/board/7"
>热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}"
href="/board/6"
>最受期待榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:1}"
data-state-val="{subnavId:1}"
class="active" href="javascript:void(0);"
>国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}"
href="/board/2"
>北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}"
href="/board/4"
>TOP100榜</a>
</li>
</ul>
</div>
<div>hfdfd</div>
"""
data=BeautifulSoup(data,'html.parser')
print(type(data.ul.parents))
print(list(enumerate(data.ul.parents)))
结果如下
<class 'generator'>
[(0, <div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
</li>
<li>
<a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
</li>
</ul>
</div>), (1, <div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
</li>
<li>
<a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
</li>
</ul>
</div>
<div>hfdfd</div>
)]
解释如下
使用data.ul.parents不仅仅返回的是直接父节点,还有其祖先节点
兄弟节点
代码如下
import requests
from bs4 import BeautifulSoup
data="""<div class="subnav">
<ul class="navbar">
<li>
<a data-act="subnav-click" data-val="{subnavClick:7}"
href="/board/7"
>热映口碑榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:6}"
href="/board/6"
>最受期待榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:1}"
data-state-val="{subnavId:1}"
class="active" href="javascript:void(0);"
>国内票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:2}"
href="/board/2"
>北美票房榜</a>
</li>
<li>
<a data-act="subnav-click" data-val="{subnavClick:4}"
href="/board/4"
>TOP100榜</a>
</li>
</ul>
</div>
<div>hfdfd</div>
"""
data=BeautifulSoup(data,'html.parser')
print(type(data.li.next_sibling))
#print(data.li.previous_sibling)
print(data.li.next_sibling)
#print(list(enumerate(data.li.previous_siblings)))
print(list(enumerate(data.li.next_siblings)))
结果如下
=============== RESTART: C:\Users\Administrator\Desktop\aaa.py ===============
<class 'bs4.element.NavigableString'>
[(0, '\n'), (1, <li>
<a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
</li>), (2, '\n'), (3, <li>
<a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
</li>), (4, '\n'), (5, <li>
<a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
</li>), (6, '\n'), (7, <li>
<a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
</li>), (8, '\n')]
>>>
解释如下
网友评论