美文网首页
BeautifulSoup使用二

BeautifulSoup使用二

作者: suntwo | 来源:发表于2019-05-04 16:38 被阅读0次

    title: BeautifulSoup使用二
    date: 2019-03-04 17:48:15
    tags:


    [TOC]

    关联选择

    子节点

    示例代码如下

    import requests
    from bs4 import BeautifulSoup
    data="""<div class="subnav">
      <ul class="navbar">
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:7}"
              href="/board/7"
          >热映口碑榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:6}"
              href="/board/6"
          >最受期待榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:1}"
              data-state-val="{subnavId:1}"
              class="active" href="javascript:void(0);"
          >国内票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:2}"
              href="/board/2"
          >北美票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:4}"
              href="/board/4"
          >TOP100榜</a>
        </li>
      </ul>
    </div>
    """
    
    data=BeautifulSoup(data,'html.parser')
    print(type(data.ul.contents))
    print(data.ul.contents)
    
    

    结果如下

    <class 'list'>
    ['\n', <li>
    <a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
    </li>, '\n', <li>
    <a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
    </li>, '\n', <li>
    <a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
    </li>, '\n', <li>
    <a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
    </li>, '\n', <li>
    <a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
    </li>, '\n']
    >>> 
    

    解释如下

    type(data.ul.contents)
    
    <class 'list'>
    

    可以看到data.ul.contents返回的是一个列表,每个元素都是这个标签的直接子标签,但是返回元素的标签是bs4.element.Tag类型的

    print(type(data.ul.contents[1]))
    
    <class 'bs4.element.Tag'>
    
    父节点和祖先节点

    示例如下

    import requests
    from bs4 import BeautifulSoup
    data="""<div class="subnav">
      <ul class="navbar">
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:7}"
              href="/board/7"
          >热映口碑榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:6}"
              href="/board/6"
          >最受期待榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:1}"
              data-state-val="{subnavId:1}"
              class="active" href="javascript:void(0);"
          >国内票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:2}"
              href="/board/2"
          >北美票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:4}"
              href="/board/4"
          >TOP100榜</a>
        </li>
        
      </ul>
      
    </div>
    <div>hfdfd</div>
    """
    
    data=BeautifulSoup(data,'html.parser')
    print(type(data.ul.parent))
    print(data.ul.parent)
    

    结果如下

    =============== RESTART: C:\Users\Administrator\Desktop\aaa.py ===============
    <class 'bs4.element.Tag'>
    <div class="subnav">
    <ul class="navbar">
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
    </li>
    <li>
    <a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
    </li>
    </ul>
    </div>
    >>> 
    

    解释如下

    可以看到data.ul.parent返回的是bs4.element.Tag类型的对象,并且返回了他的直接父节点的bs4.element.Tag对象

    示例如下

    import requests
    from bs4 import BeautifulSoup
    data="""<div class="subnav">
      <ul class="navbar">
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:7}"
              href="/board/7"
          >热映口碑榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:6}"
              href="/board/6"
          >最受期待榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:1}"
              data-state-val="{subnavId:1}"
              class="active" href="javascript:void(0);"
          >国内票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:2}"
              href="/board/2"
          >北美票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:4}"
              href="/board/4"
          >TOP100榜</a>
        </li>
        
      </ul>
      
    </div>
    <div>hfdfd</div>
    """
    
    data=BeautifulSoup(data,'html.parser')
    print(type(data.ul.parents))
    print(list(enumerate(data.ul.parents)))
    

    结果如下

    <class 'generator'>
    [(0, <div class="subnav">
    <ul class="navbar">
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
    </li>
    <li>
    <a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
    </li>
    </ul>
    </div>), (1, <div class="subnav">
    <ul class="navbar">
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:7}" href="/board/7">热映口碑榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
    </li>
    <li>
    <a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
    </li>
    <li>
    <a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
    </li>
    </ul>
    </div>
    <div>hfdfd</div>
    )]
    

    解释如下

    使用data.ul.parents不仅仅返回的是直接父节点,还有其祖先节点

    兄弟节点

    代码如下

    import requests
    from bs4 import BeautifulSoup
    data="""<div class="subnav">
      <ul class="navbar">
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:7}"
              href="/board/7"
          >热映口碑榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:6}"
              href="/board/6"
          >最受期待榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:1}"
              data-state-val="{subnavId:1}"
              class="active" href="javascript:void(0);"
          >国内票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:2}"
              href="/board/2"
          >北美票房榜</a>
        </li>
        <li>
          <a data-act="subnav-click" data-val="{subnavClick:4}"
              href="/board/4"
          >TOP100榜</a>
        </li>
        
      </ul>
      
    </div>
    <div>hfdfd</div>
    """
    
    data=BeautifulSoup(data,'html.parser')
    print(type(data.li.next_sibling))
    #print(data.li.previous_sibling)
    print(data.li.next_sibling)
    #print(list(enumerate(data.li.previous_siblings)))
    print(list(enumerate(data.li.next_siblings)))
    
    

    结果如下

    =============== RESTART: C:\Users\Administrator\Desktop\aaa.py ===============
    <class 'bs4.element.NavigableString'>
    
    
    [(0, '\n'), (1, <li>
    <a data-act="subnav-click" data-val="{subnavClick:6}" href="/board/6">最受期待榜</a>
    </li>), (2, '\n'), (3, <li>
    <a class="active" data-act="subnav-click" data-state-val="{subnavId:1}" data-val="{subnavClick:1}" href="javascript:void(0);">国内票房榜</a>
    </li>), (4, '\n'), (5, <li>
    <a data-act="subnav-click" data-val="{subnavClick:2}" href="/board/2">北美票房榜</a>
    </li>), (6, '\n'), (7, <li>
    <a data-act="subnav-click" data-val="{subnavClick:4}" href="/board/4">TOP100榜</a>
    </li>), (8, '\n')]
    >>> 
    

    解释如下

    相关文章

      网友评论

          本文标题:BeautifulSoup使用二

          本文链接:https://www.haomeiwen.com/subject/pxxfoqtx.html