2019-06-21-爬虫-FRIST TRY

2019-06-21-爬虫-FRIST TRY

作者: ElfACCC | 来源:发表于2019-06-21 14:53 被阅读0次

2019-06-21-爬虫-FRIST TRY
哈喽！筒书
FRIST
frist
Frist
frist
TheBrain空间壁纸的来源_Papers.co(201906
If an old friend is visiting you
前言（第一版）
学习笔记 2018-10-22

image.png

image.png

image.png

image.png

遇到的问题：

response.encoding='utf-8' 中文显示乱码

image.png

image.png

原因：因为原网站是gb2312的，所以要设置成'gb2312'就不乱码了

image.png

这样也不会出错

完整代码：

import requests
import re

url = 'http://www.jjwxc.net/onebook.php?novelid=379995'
response = requests.get(url)
#html =  response.text.encode('ISO-8859-1').decode('gbk')
response.encoding = 'gb2312'
html = response.text
title = re.findall(r'<span itemprop="articleSection">(.*?)</span>',html)[0]
#print(title)
# title = re.findall(r'<span itemprop="articleSection">(.*?)</span>',html)[0]
# print(title)
fb = open('%s.txt' % title,'w',encoding='utf-8')
list = re.findall(r'<a itemprop="url" href="(.*?)">(.*?)</a>',html,re.S)
#print(list)
for chapter_info in list:
    chapter_url,chapter_title = chapter_info
    chapter_response = requests.get(chapter_url)
    chapter_response.encoding = 'gb2312'
    chapter_html = chapter_response.text
    chapter_content = re.findall(r'<div style="clear:both;"></div>(.*?)<div id="favoriteshow_3"',chapter_html,re.S)[0]
    chapter_content = chapter_content.replace('<br>','\n')
    chapter_content = chapter_content.replace('\u3000',' ')
    chapter_content = chapter_content.replace('&#8226;','•')
    
    fb.write(chapter_title)
    fb.write(chapter_content)
    fb.write('\n')
    
    print(chapter_title)

image.png

image.png

相关文章

2019-06-21-爬虫-FRIST TRY
遇到的问题： response.encoding='utf-8' 中文显示乱码原因：因为原网站是gb2312的，...
哈喽！筒书
my frist!
FRIST
第一次使用（其实也不算第一次）但至少是第一篇哈哈哈哈从微博辗转到icily最后到这里其实还是喜欢隐私性更强...
frist
Frist
飞到哪里一盏灯，一片昏黄；一简书，一杯淡茶。守着那一份淡定，品读属于自己的寂寞。保持淡定，才能欣赏...
frist
text test
TheBrain空间壁纸的来源_Papers.co(201906
(2019-06-21-周五 06:26:39) http://papers.co/desktop/page/2/...
If an old friend is visiting you
A 2-day itineary Frist day...
前言（第一版）
Forword(from the frist edition)The title of this book alm...
学习笔记 2018-10-22
第一单元小结通用代码框架try - except 网络爬虫引发的问题爬取网页，玩转网页 requests爬取网站...

网友评论

本文标题：2019-06-21-爬虫-FRIST TRY

本文链接：https://www.haomeiwen.com/subject/zbbyqctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|2019-06-21-爬虫-FRIST TRY|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！