from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>data</p>', 'html.parser')
BeautifulSoup的功能
标签结构
BeautifulSoup类.png
beautifulsoup 解析器
BeautifulSoup 类的基本元素
- Tag 用<> and </>表明开头和结尾
- Name <p>...</p>的名字是p,格式: <tag>.name
- Attributes 标签的属性,字典格式 <tag>.attrs
- NavigableString <>...</>内字符串 <tag>.string
- Comment 字符串的注释部分
bs的理解
https://python123.io/ws/demo.html
HTML基本格式.png
标签树的下行遍历.png
from bs4 import BeautifulSoup as BS
soup = BS(demo,'html.parser')
soup.head
<head><title>This is a python demo page</title></head>
soup.head.contents
[<title>This is a python demo page</title>]
soup.body.contents
['\n', <p class="title"><b>The demo python introduces several python courses.</b></p>, '\n', <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>, '\n']
>>> len(soup.body.contents)
5
>>> soup.body.contents[1]
<p class="title"><b>The demo python introduces several python courses.</b></p>
>>> soup.body.children
<list_iterator object at 0x00000000039D2748>
上行方法.png
上行遍历.png
1.png 平行遍历.png
bs遍历.png
print(soup.prettify()) #prettify方法
bs入门.png
soup.find_all(name, attrs, recursive,string,**kwargs) #返回列表?
name:对标签名称检索的字符串
attrs:对标签属性值的检索字符串,可标注属性
len(soup.find_all(src=re.compile('jpg'),class_=re.compile('image')))
for tag in soup.find_all(src=re.compile('jpg'),class_=re.compile('image')):
print(tag.attrs['src'])
网友评论