ElementTree 和 BeautifulSoup 处理XM

作者: badxiao | 来源:发表于2018-06-25 14:57 被阅读11次

ElementTree 和 BeautifulSoup 处理XM
Python使用ElementTree处理XML
2020-12-09：python-sax
xml.etree.ElementTree 简介
Python爬虫入门处理数据
爬虫第五讲：BeautifulSoup网页解析库
python解析.xml
xml ElementTree
Python爬取酷狗Top500的歌曲！够你吹个小牛皮了吧
Python最全总结 2

文件：icd10cm_tabular_2019.xml
文件大小：8.7M

测试 1

将 XML 全部加在到内存，分别使用 ElementTree.fromstring 和 BeautifulSoup(字符串) 构建 10 次文档树，代码如下：

from bs4 import BeautifulSoup
from xml.etree import ElementTree
import time
from tqdm import trange

xml_path = "./icd10cm_tabular_2019.xml"
xml_content = None
with open(xml_path, "rb") as xml_file:
    xml_content = xml_file.read()

start_time = time.time()
for _ in trange(10):
    ElementTree.fromstring(xml_content)
print("ElementTree", round(time.time() - start_time, 2))

start_time = time.time()
for _ in trange(10):
    BeautifulSoup(xml_content, "lxml-xml")
print("BeautifulSoup lxml", round(time.time() - start_time, 2))

测试结果：

	测试次数	耗时(秒)	平均耗时(秒)
ElementTree	10	8.53	0.8
BeautifulSoup lxml-xml	10	149.3	15

测试 2

不加载 XML 到内存，直接读文件，分别使用 ElementTree.parse 和 BeautifulSoup(文件流) 构建 10 次文档树，代码如下：

from bs4 import BeautifulSoup
from xml.etree import ElementTree
import time
from tqdm import trange

xml_path = "/Users/yangxiao/DATA/2019-ICD-10-CM/icd10cm_tabular_2019.xml"

start_time = time.time()
for _ in trange(10):
    with open(xml_path, "rb") as xml_file:
        ElementTree.parse(xml_file)
print("ElementTree", round(time.time() - start_time, 2))

start_time = time.time()
for _ in trange(10):
    with open(xml_path, "rb") as xml_file:
        BeautifulSoup(xml_file, "lxml-xml")
print("BeautifulSoup lxml", round(time.time() - start_time, 2))

测试结果：

	测试次数	耗时(秒)	平均耗时(秒)
ElementTree	10	6.2	0.6
BeautifulSoup lxml-xml	10	105.56	10

结论：

ElementTree 明显快于 BeautifulSoup lxml-xml ，至少快 10 倍以上。
ElementTree.parse 速度快于 ElementTree.fromstring

网友评论

本文标题：ElementTree 和 BeautifulSoup 处理XM

本文链接：https://www.haomeiwen.com/subject/mhryyftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

ElementTree 和 BeautifulSoup 处理XM

测试 1

测试 2

结论：

相关文章

ElementTree 和 BeautifulSoup 处理XM

Python使用ElementTree处理XML

2020-12-09：python-sax

xml.etree.ElementTree 简介

Python爬虫入门处理数据

爬虫第五讲：BeautifulSoup网页解析库

python解析.xml

xml ElementTree

Python爬取酷狗Top500的歌曲！够你吹个小牛皮了吧

Python最全总结 2

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

python爬虫

生活不易我用python