美文网首页
BeautifulSoup

BeautifulSoup

作者: harukou_ou | 来源:发表于2018-02-06 21:47 被阅读15次
    from bs4 import BeautifulSoup
    soup = BeautifulSoup('<p>data</p>', 'html.parser')
    
    BeautifulSoup的功能
    标签结构
    BeautifulSoup类.png
    beautifulsoup 解析器

    BeautifulSoup 类的基本元素

    • Tag 用<> and </>表明开头和结尾
    • Name <p>...</p>的名字是p,格式: <tag>.name
    • Attributes 标签的属性,字典格式 <tag>.attrs
    • NavigableString <>...</>内字符串 <tag>.string
    • Comment 字符串的注释部分

    bs的理解
    https://python123.io/ws/demo.html
    HTML基本格式.png
    标签树的下行遍历.png
    from bs4 import BeautifulSoup as BS
    soup = BS(demo,'html.parser')
     soup.head
    <head><title>This is a python demo page</title></head>
     soup.head.contents
    [<title>This is a python demo page</title>]
    soup.body.contents
    ['\n', <p class="title"><b>The demo python introduces several python courses.</b></p>, '\n', <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    
    <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>, '\n']
    >>> len(soup.body.contents)
    5
    >>> soup.body.contents[1]
    <p class="title"><b>The demo python introduces several python courses.</b></p>
    >>> soup.body.children
    <list_iterator object at 0x00000000039D2748>
    
    上行方法.png 上行遍历.png
    1.png 平行遍历.png
    bs遍历.png
    print(soup.prettify())  #prettify方法
    
    bs入门.png
    soup.find_all(name, attrs, recursive,string,**kwargs) #返回列表?
    

    name:对标签名称检索的字符串
    attrs:对标签属性值的检索字符串,可标注属性

    len(soup.find_all(src=re.compile('jpg'),class_=re.compile('image')))
    
    for tag in soup.find_all(src=re.compile('jpg'),class_=re.compile('image')):
        print(tag.attrs['src'])
    

    相关文章

      网友评论

          本文标题:BeautifulSoup

          本文链接:https://www.haomeiwen.com/subject/xbsnzxtx.html