python-BeautifulSoup

作者: 点点渔火 | 来源:发表于2017-03-16 22:12 被阅读0次

python-BeautifulSoup
Python-BeautifulSoup 实战（二）：获取文章详
Python-BeautifulSoup 实战（一）：获取简书

Beautiful Soup是一个可以从HTML或XML文件中提取数据的Python库. 使用十分方便，先放两个链接，官网教程有点太多，第二个blog链接很容易上手。

官方文档： https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

博客教程： http://cuiqingcai.com/1319.html

我自己做个简要笔记Mark一下

安装

****一步安装****：

easy_install beautifulsoup4

pip install beautifulsoup4

****源码安装****：

http://www.crummy.com/software/BeautifulSoup/download/4.x/

sudo python setup.py install

(如果没有sudo权限可参考另外一篇blog)

****安转解析器****：

Beautiful Soup支持Python标准库中的HTML解析器,还支持一些第三方的解析器,其中一个是lxml.根据操作系统不同,可以选择下列方法来安装lxml:

pip install lxml  或者  pip  install html5lib

![说明][2]
[2]: https://img.haomeiwen.com/i5223866/35b53ffb567d03ff.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240

使用：

    import re
    import bs4
    from bs4 import BeautifulSoup
    
    html_doc = """
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title" name="dromouse"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters;  and their names were
    <a href="http://example.com/elsie" class="sister" id="link1"><!--   Elsie --></a>,
    <a href="http://example.com/lacie" class="sister"   id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">邮件    组</a>;
    and they lived at the bottom of a well.</p>
    <p class="story">...</p>
    """
    
    # 创建BeautifulSoup 对象
    soup = BeautifulSoup(html_doc)
    # 或者打开html文件 soup = BeautifulSoup(open('index.html'))
    
    # 如果要解析xml， 前提要安装lxml
    soup = BeautifulSoup(markup, "xml")
    
    # 打印soup对象的内容, 格式化输出
    print soup.prettify() 
    
    # 打印文本 去掉html符号：
    print soup.get_text()

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种: Tag , NavigableString , BeautifulSoup , Comment .
未完待续。。。

网友评论

本文标题：python-BeautifulSoup

本文链接：https://www.haomeiwen.com/subject/xkksnttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python-BeautifulSoup

安装

使用：

相关文章