Python使用BeautifulSoup解析出现<br/

作者: Rokkia | 来源:发表于2017-01-06 11:01 被阅读1274次

Python使用BeautifulSoup解析出现<br/
Python实战计划学习笔记（2）网页解析
2020-05-27 学习python爬虫系列（四）：Beaut
BeautifulSoup库相关操作
BeautifulSoup4的基本使用
Python解析网页的几种其他好方法
【爬虫】-001-使用BeautifulSoup解析网页
【week1】day2：解析本地网页
爬虫学习一
通过python抓取网页内容实战

最近看小说,总有奇奇怪怪的广告出现,于是想要抓下来慢慢看,于是开始动手:

resp = requests.post(URL + endPoint,headers = headers)
      soup = bs(resp.content,'html.parser',from_encoding = 'utf-8')
      #抓取文章内容
      tag = soup.find('div',id='nr')
      tagnext = soup.find('a',id = 'pb_next')
      nr = tag.get_text().encode('utf-8')
      #获取下一章地址
      match = re.match('.*html$',tagnext['href'])
      if match is None:
          return (nr,None)
      return (nr,match.string)

写完后我想,终于可以看小说了,于是我开心的点开我的文件

F361CE6F-5013-4E2F-9677-443644AD7C48.png

发现上面有好多^M的,看了一下html之后发现,每一行都有一个
标签来换行.
于是查找解决方法,最终发现了一个很好用的方法

只需要更改一个小地方,在get_text()中用\n替换掉<br/>
print (tag.get_text('\n','<br/>'))

B2D76A90-E179-41C6-BB2D-B9E8B785874E.png

问题解决.

scrapy xpath 去除'</br>'

response = response.replace(body=response.body.replace(b'<br>', b'\n'))

感谢收看,欢迎拍砖

网友评论

本文标题：Python使用BeautifulSoup解析出现<br/

本文链接：https://www.haomeiwen.com/subject/xbltbttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python使用BeautifulSoup解析出现<br/

scrapy xpath 去除'</br>'

感谢收看,欢迎拍砖

相关文章

Python使用BeautifulSoup解析出现<br/

Python实战计划学习笔记（2）网页解析

2020-05-27 学习python爬虫系列（四）：Beaut

BeautifulSoup库相关操作

BeautifulSoup4的基本使用

Python解析网页的几种其他好方法

【爬虫】-001-使用BeautifulSoup解析网页

【week1】day2：解析本地网页

爬虫学习一

通过python抓取网页内容实战

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Python 运维

我的Python自学之路