from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>data</p>', 'html.parser')
data:image/s3,"s3://crabby-images/076c9/076c9342f5bb9c71440420d90fc2ebbc7480f9f9" alt=""
data:image/s3,"s3://crabby-images/39e50/39e50adfcb87ef6e476938493a2e5685082dafb2" alt=""
data:image/s3,"s3://crabby-images/b1262/b12624166002866082fdef58788463080c307b16" alt=""
data:image/s3,"s3://crabby-images/6a512/6a512810f6201403d6231ec0294bf8123d79ec24" alt=""
BeautifulSoup 类的基本元素
- Tag 用<> and </>表明开头和结尾
- Name <p>...</p>的名字是p,格式: <tag>.name
- Attributes 标签的属性,字典格式 <tag>.attrs
- NavigableString <>...</>内字符串 <tag>.string
- Comment 字符串的注释部分
data:image/s3,"s3://crabby-images/9ab70/9ab703f41eb7750831de04dabc0f4caaae2197a6" alt=""
https://python123.io/ws/demo.html
data:image/s3,"s3://crabby-images/9d9fa/9d9fa9ab1083f4fbd02c5038b832cba9db145a7f" alt=""
data:image/s3,"s3://crabby-images/1b344/1b34491e40d1b27e36ace0cb1d337f693d65ffc5" alt=""
data:image/s3,"s3://crabby-images/609d4/609d48e62e3185e5068cb755679bdb187d5a500f" alt=""
from bs4 import BeautifulSoup as BS
soup = BS(demo,'html.parser')
soup.head
<head><title>This is a python demo page</title></head>
soup.head.contents
[<title>This is a python demo page</title>]
soup.body.contents
['\n', <p class="title"><b>The demo python introduces several python courses.</b></p>, '\n', <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>, '\n']
>>> len(soup.body.contents)
5
>>> soup.body.contents[1]
<p class="title"><b>The demo python introduces several python courses.</b></p>
>>> soup.body.children
<list_iterator object at 0x00000000039D2748>
data:image/s3,"s3://crabby-images/29030/290305633fa0116681ec18962ffb7e3cf3eeddd6" alt=""
data:image/s3,"s3://crabby-images/b2c80/b2c801ddfbc21b08fe434adcda68a50c48f91b29" alt=""
data:image/s3,"s3://crabby-images/f9e56/f9e56f3e71dc76f8472f20a3cb304e042eefc108" alt=""
data:image/s3,"s3://crabby-images/885a9/885a992ca7f9ed077327a36edb50dab2d5747e2c" alt=""
data:image/s3,"s3://crabby-images/1b002/1b002eed2da26d09254921c4f81323880ba55a3d" alt=""
print(soup.prettify()) #prettify方法
data:image/s3,"s3://crabby-images/ef410/ef410a0b7c6a42c4f6318290eefa28fb5867500b" alt=""
soup.find_all(name, attrs, recursive,string,**kwargs) #返回列表?
name:对标签名称检索的字符串
attrs:对标签属性值的检索字符串,可标注属性
len(soup.find_all(src=re.compile('jpg'),class_=re.compile('image')))
for tag in soup.find_all(src=re.compile('jpg'),class_=re.compile('image')):
print(tag.attrs['src'])
网友评论