遍历html提取所有xpath路径(xpath树))

作者: dawsongzhao | 来源:发表于2018-02-06 09:44 被阅读27次

API参考文档
lxm API

getchildren(self)
     
Returns all direct children. The elements are returned in document order.

Deprecated: Note that this method has been deprecated as of ElementTree 1.3 and lxml 2.0. New code should use list(element) or simply iterate over elements.

html文件

<html>
 <body>
  <center>
   <h1>
    Nokia to supply 5G equipment to NTT DOCOMO in support of launch of commercial 5G service
   </h1>
  </center>
  <p class="hugin">
   <b class="hugin">
    Press Release
   </b>
  </p>
  <ul class="hugin" type="disc">
   <li class="hugin">
    <div class="hugin">
     Further enhancement for NTT DOCOMO's existing base station baseband units and integration of Nokia's 3GPP-compliant 5G New Radio-based hardware into NTT DOCOMO's network  to enable smooth evolution for NTT DOCOMO from 4G/LTE to 5G
    </div>
   </li>
   <li class="hugin">
    <div class="hugin">
     Agreement builds on long-term collaboration between Nokia and NTT DOCOMO, in order for the operator to aim to have 5G mobile service operational by 2020
    </div>
   </li>
  </ul>
  <p class="hugin">
   19 January, 2018
  </p>
  <p class="hugin">
   <b class="hugin">
    Espoo, Finland/Tokyo, Japan - Nokia has signed an agreement with NTT DOCOMO, Japan's largest mobile operator, to supply 5G baseband products for aiming to deploy in a 5G mobile network planned to be in commercial service by 2020.
   </b>
  </p>
  <p class="hugin">
   Nokia will support NTT DOCOMO's commercial 5G operation in Japan by further enhancing existing baseband units and integrating its 5G New Radio (5G NR)-based* AirScale hardware in the network, which will provide NTT DOCOMO's mobile customers with a unique experience fueled by 5G's extreme high speed, superior capacity and ultra-low latency. With NTT DOCOMO looking to get its  5G commercial service by 2020, Nokia's solution will provide a natural evolution to existing 4G/LTE deployments and also successful integration  into the existing operational environment.
  </p>
  <p class="hugin">
   Nokia has enjoyed a long-term working relationship with Japan's largest operator that has produced supply agreements for 3G and 4G/LTE networking technology. The two companies have also worked closely together in trials of 5G technologies, and now agree on supply of Nokia's 5G BBUs to be able to do centralized management for 5G RRHs (remote radio heads) for aiming to deploy in 5G network. This is aligned with NTT DOCOMO's 5G direction, which is fully utilizing existing C-RAN architecture for 5G. Based on the agreement, Nokia  will support NTT DOCOMO in the evolution of its network from 4G/LTE to 5G, providing technology based on the new 3GPP-compliant 5G NR standard, the first stage of which was published shortly before the end of 2017.
  </p>
  <p class="hugin">
   <b class="hugin">
    Hiroshi Nakamura, Executive Vice President and Chief Technology Officer, NTT DOCOMO said
   </b>:"We have been collaborating with partners such as Nokia on various 5G technology and use case trials since 2014. With this agreement with Nokia, we are now proceeding to the next step to launch 5G mobile services by 2020, and accelerate co-creation of new services and businesses with vertical industry partners."
  </p>
  <p class="hugin">
   <b class="hugin">
    Marc Rouanne, president of Mobile Networks at Nokia, said
   </b>
   : "The agreement with NTT DOCOMO is a major milestone in bringing 5G to commercial reality, especially in a country with a long and proud history of technological achievements and early technology adoption. Together we have worked hard in recent months to commence preparations for NTT DOCOMO's eventual launch of its operational 5G service by 2020, which we have now set in motion by this very exciting announcement today."
  </p>
    <br class="hugin"/>
 </body>
</html>

Python源码

# coding:utf-8
from lxml import etree


class GetAllXpath:
    def __init__(self,html_text):
        self.html_text = html_text

    def parse(self):
        page_doc = etree.HTML(self.html_text)
        print page_doc.tag
        self.iter_elem(page_doc,"")

    def iter_elem(self,elem,full_path):
        full_path += "/%s" % elem.tag
        print full_path
        if len(list(elem)) > 0:
            # 递归处理子节点
            for e in list(elem):
                self.iter_elem(e,full_path)


if __name__ == "__main__":

    with open("getallxpath.html") as f:
        content = f.read()
        GetAllXpath(content).parse()

输出结果

html
/html
/html/body
/html/body/center
/html/body/center/h1
/html/body/p
/html/body/p/b
/html/body/ul
/html/body/ul/li
/html/body/ul/li/div
/html/body/ul/li
/html/body/ul/li/div
/html/body/p
/html/body/p
/html/body/p/b
/html/body/p
/html/body/p
/html/body/p
/html/body/p/b
/html/body/p
/html/body/p/b
/html/body/br

Process finished with exit code 0

网友评论

本文标题：遍历html提取所有xpath路径(xpath树))

本文链接：https://www.haomeiwen.com/subject/wtnwzxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

遍历html提取所有xpath路径(xpath树))

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读