美文网首页
lxml解析网页速度比BeautifulSoup快

lxml解析网页速度比BeautifulSoup快

作者: bbjoe | 来源:发表于2016-08-17 12:50 被阅读0次

    我的代码:

    # -*- coding: utf-8 -*-
    import requests
    from time import ctime
    from lxml import etree
    from bs4 import BeautifulSoup
    
    url = 'http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html'
    tries = 300
    web_data = requests.get(url).text
    
    # step 1
    print('lxml start at:', ctime())
    while tries > 0:
        lxml_page = etree.HTML(web_data)
        tries = tries - 1
    print('lxml done at:', ctime())
    
    # step 2
    print('soup start at:', ctime())
    while tries > 0:
        soup_page = BeautifulSoup(web_data, 'lxml')
        tries = tries - 1
    print('soup done at:', ctime())
    

    我是分步运行的:先注释掉step2,运行step1;之后注释掉1,运行2。新手轻拍

    运行结果:

    解析一个博客页面300次,Beautiful用了约8秒lxml用了约1秒

    BeautifulSoup.png lxml.png

    相关文章

      网友评论

          本文标题:lxml解析网页速度比BeautifulSoup快

          本文链接:https://www.haomeiwen.com/subject/nkymsttx.html