美文网首页
BeautifulSoup4

BeautifulSoup4

作者: livein80 | 来源:发表于2020-07-28 17:54 被阅读0次

    1. bs4简介

    • BeautifulSoup,一个可以从html或者xml文件中提取数据的网页信息库
    • 安装:
        pip install lxml
        pip install bs4
      

    2. bs4使用

    html_doc = """
    <html><head><title>The Dormouse's story</title></head>
      <body>
          <p class="title"><b>The Dormouse's story</b></p>
          <p class="story">Once upon a time there were three little sisters; a
          nd their names were
          <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
          <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and
          <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
          <p class="story">...</p>
      </body>
    </html>
    """
    

    1 # 获取bs对象
    2 bs = BeautifulSoup(html_doc,'lxml')
    3 # 打印⽂档内容(把我们的标签更加规范的打印)
    4 print(bs.prettify())
    5 print(bs.title) # 获取title标签内容 <title>The Dormouse's story</title>
    6 print(bs.title.name) # 获取title标签名称 title
    7 print(bs.title.string) # title标签⾥⾯的⽂本内容 The Dormouse's story
    8 print(bs.p) # 获取p段落

    相关文章

      网友评论

          本文标题:BeautifulSoup4

          本文链接:https://www.haomeiwen.com/subject/ahcqrktx.html