BeautifulSoup文档学习4-输出

作者: JA_Cobra | 来源:发表于2020-03-13 12:10 被阅读0次

BeautifulSoup文档学习4-输出
Python3爬虫神器BeautifulSoup（四）——其他杂
BeautifulSoup基础使用
2018-08-08
2018-03-08
BeautifulSoup4解析器(css选择器)
爬虫任务二
BeautifulSoup文档学习2-遍历文档树
BeautifulSoup文档学习3-搜索文档树
Python爬取豆瓣电子小说

输出

格式化输出

prettify()方法将BeautifulSoup的文档树格式化后以Unicode编码输出，每个XML/HTML标签独占一行。

示例：

>>> markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
>>> soup = BeautifulSoup(markup)
>>> soup.prettify()
'<html>\n <body>\n  <a href="http://example.com/">\n   I linked to\n   <i>\n    example.com\n   </i>\n  </a>\n </body>\n</html>'
>>> print(soup.prettify())
<html>
 <body>
  <a href="http://example.com/">
   I linked to
   <i>
    example.com
   </i>
  </a>
 </body>
</html>

BeautifulSoup对象和它的tag节点都可以调用prettify()方法

压缩输出

如果只想得到结果字符串，可以对BeautifulSoup对象或者tag对象直接使用Python的unicode()和str()方法：

>>> str(soup)
'<html><head></head><body><a href="http://example.com/">I linked to <i>example.com</i></a></body></html>'
 
>>> unicode(soup.a)
u'<a href="http://example.com/">I linked to <i>example.com</i></a>'

`get_text()`

如果只想得到tag中包含的文本内容，可以使用get_text()方法，这个方法获取到tag中包含的所有文本内容包括子孙节点中tag的内容，并结果作为Unicode字符串返回：

>>> markup = '<a href="http://example.com/">\nI linked to <i>example.com</i>\n</a>'
>>> soup = BeautifulSoup(markup)
 
>>> soup.get_text()
'\nI linked to example.com\n'
>>> soup.i.get_text()
'example.com'

可以通过参数指定tag的文本内容的分隔符：

>>> soup.get_text("|")
'\nI linked to |example.com|\n'

还可以去除获得内容的前后空白：

>>> soup.get_text("|", strip=True)
'I linked to|example.com'

网友评论

本文标题：BeautifulSoup文档学习4-输出

本文链接：https://www.haomeiwen.com/subject/aziqshtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BeautifulSoup文档学习4-输出

输出

格式化输出

压缩输出

`get_text()`

相关文章

BeautifulSoup文档学习4-输出

Python3爬虫神器BeautifulSoup（四）——其他杂

BeautifulSoup基础使用

2018-08-08

2018-03-08

BeautifulSoup4解析器(css选择器)

爬虫任务二

BeautifulSoup文档学习2-遍历文档树

BeautifulSoup文档学习3-搜索文档树

Python爬取豆瓣电子小说

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读