Python的Beautiful Soup学习笔记

作者: 横云乱雪 | 来源:发表于2017-08-19 11:42 被阅读0次

# 安装好Python， 之后再安装bs4和lxml解析器
>>>pip install bs4
>>>pip install lxml

# -*- coding=utf8 -*-
from bs4 import BeautifulSoup

# 用lxml解析html这个文档
soup = bs4(html, 'lxml')

# 查找第一个出现的a标签
soup.find("a")

# 查找所有a标签, 返回值为列表
soup.find_all("a")

# 获取所有文件内容
soup.get_text()

# 获取a标签内class属性
tag_a = soup.find("a")
tag_a["class"]

# 获取a标签内的文字内容
tag_a = soup.find("a")
tag_a.string
# 可以直接转换为unicode字符串
unicode(tag_a.string)

# 加入正则表达式
import re 
# 找出所有含a的标签
soup.find_all(re.compile("a"))

# 找出所有含a、b标签
soup.find_all(["a", "b"])


# 详解find_all()
# 找出所有p标签中含有title属性的内容
soup.find_all("p", "title")

# 找出所有href属性符合这个正则表达式且id="link1"的内容
import re 
soup.find_all(href=re.compile("elsie"), id="link1")

# 找出所有a标签中有class为sister的内容，由于python含有class这个类名，产生冲突所以需要改成class_
soup.find_all("a", class_="sister")

# 找到所有a标签，限制返回列表的个数为2
soup.find_all("a", limit=2)

网友评论

本文标题：Python的Beautiful Soup学习笔记

本文链接：https://www.haomeiwen.com/subject/rfltdxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python的Beautiful Soup学习笔记

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读