美文网首页
BeautifulSoup4

BeautifulSoup4

作者: livein80 | 来源:发表于2020-07-28 17:54 被阅读0次

1. bs4简介

  • BeautifulSoup,一个可以从html或者xml文件中提取数据的网页信息库
  • 安装:
      pip install lxml
      pip install bs4
    

2. bs4使用

html_doc = """
<html><head><title>The Dormouse's story</title></head>
  <body>
      <p class="title"><b>The Dormouse's story</b></p>
      <p class="story">Once upon a time there were three little sisters; a
      nd their names were
      <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and
      <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
      <p class="story">...</p>
  </body>
</html>
"""

1 # 获取bs对象
2 bs = BeautifulSoup(html_doc,'lxml')
3 # 打印⽂档内容(把我们的标签更加规范的打印)
4 print(bs.prettify())
5 print(bs.title) # 获取title标签内容 <title>The Dormouse's story</title>
6 print(bs.title.name) # 获取title标签名称 title
7 print(bs.title.string) # title标签⾥⾯的⽂本内容 The Dormouse's story
8 print(bs.p) # 获取p段落

相关文章

网友评论

      本文标题:BeautifulSoup4

      本文链接:https://www.haomeiwen.com/subject/ahcqrktx.html