Python爬虫实战

作者: kaiiiu | 来源:发表于2019-01-02 17:08 被阅读0次

Python网络爬虫实战之十四：Scrapy结合scrapy-s
Python网络爬虫实战之七：动态网页爬取案例实战 Seleni
Python网络爬虫实战之八：动态网页爬取案例实战 Seleni
Python网络爬虫实战之九：Selenium进阶操作与爬取京东
Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍
Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集
Python网络爬虫实战之六：静态网页爬取案例实战
Python网络爬虫实战之二：环境部署、基础语法、文件操作
Python网络爬虫实战之一：网络爬虫理论基础
Python网络爬虫实战之四：BeautifulSoup

爬取小说网站上的小说并下载到本地

导入库

from urllib import request
import re

request：用来请求网页
re：导入正则表达式

确定需要爬取的小说网址

url="http://www.6mao.com/html/1/1052/736343.html"

获得该页面的全部内容

webpage=request.urlopen(url)
data=webpage.read().decode("gbk")

webpage：存放请求网页的变量
decode方法：将二进制的网页字符解码，格式为“gbk”

数据清洗

data=data.replace('&nbsp;',"")
data=data.replace('<br />',"")

使用replace方法将data中的&nbsp和</br>去除

观察data中的数据，可以发现小说内容被包含在一个<div id="neirong"></div>标签中。

txt=re.findall(r'<div id="neirong">(.*?)</div>',data,re.S)

使用正则表达式<div id="neirong">(.*?)</div>获取<div id="neirong"></div>中的全部内容，并保存在变量txt中。

至此，小说内容已经处理好，将小说内容写出。

of=open("book.txt","w")
for t in txt:
    of.write(t)
of.close()

使用open方法创建一个book的txt文件，用for循环将txt的内容写入，最后用close方法关闭文件释放资源。

完成代码：

from urllib import request
import re

url="http://www.6mao.com/html/1/1052/736343.html"

webpage=request.urlopen(url)
data=webpage.read().decode("gbk")

data=data.replace('&nbsp;',"")
data=data.replace('<br />',"")

txt=re.findall(r'<div id="neirong">(.*?)</div>',data,re.S)
of=open("book.txt","w")

for t in txt:
    of.write(t)
of.close()

网友评论

本文标题：Python爬虫实战

本文链接：https://www.haomeiwen.com/subject/lfwrlqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python爬虫实战

爬取小说网站上的小说并下载到本地

相关文章

Python网络爬虫实战之十四：Scrapy结合scrapy-s

Python网络爬虫实战之七：动态网页爬取案例实战 Seleni

Python网络爬虫实战之八：动态网页爬取案例实战 Seleni

Python网络爬虫实战之九：Selenium进阶操作与爬取京东

Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍

Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集

Python网络爬虫实战之六：静态网页爬取案例实战

Python网络爬虫实战之二：环境部署、基础语法、文件操作

Python网络爬虫实战之一：网络爬虫理论基础

Python网络爬虫实战之四：BeautifulSoup

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读