python小爬虫抓取搞笑图片V2.0

作者: 无与童比 | 来源:发表于2014-06-04 13:24 被阅读557次

python小爬虫抓取搞笑图片V2.0
python小爬虫抓取搞笑图片
python 爬虫抓取图片
python抓取图片爬虫
python爬虫抓取图片
贴吧帖子内图片抓取
python爬虫抓取视频保存到文件
scrapy抓取百度图片-写给自己看爬虫系列1
学会爬虫抓取竞争对手数据，《Python3网络爬虫开发实战》PD
【python】网络爬虫抓取图片

我们可以看出上一个小程序并没有使用很多的技巧，这个时候我想抓几千张图片。怎么办？

我先贴代码v2.0

需要安装第三方库bs4

from bs4 import BeautifulSoup
import os,urllib.request
beginNum = int(input("Please input (1111 -3111))\n"))
for i in range(beginNum,3111):
    pageUrl = 'http://www.76xh.com/tupian/'+str(i)+'.html'
    htmlDoc = urllib.request.urlopen(pageUrl).read()
    soup = BeautifulSoup(htmlDoc)
    print ("正在下载第(%d)"%(i))
    divHtml = soup.find_all("div",class_="pic_text")
    imgUrl = 'http://www.76xh.com' + divHtml[0].img.attrs['src']
    data = urllib.request.urlopen(imgUrl).read()
    fileName = soup.title.contents[0] + '.jpg'
    filePath = os.path.join('C:/img',fileName)
    image = open(filePath,'wb')
    image.write(data)
    image.close()
    
print ('OK')

可以尝试运行一下，发现下载了几张图片以后基本上就停止下载了？
为什么呢？
这让我想起了一句比较经典的话，大致的意思是这样子的，一个程序的核心代码如果只有几十行的话，写成一个足够适应绝大部分情况的程序需要写成几百行。
在这个过程中，出现了一个问题，就是img的链接地址失效了怎么办？
不卖关子了，我们为了解决各种异常情况，引入了一个异常机制。

from bs4 import BeautifulSoup
import os,urllib.request
beginNum = int(input("Please input (1111 -3111))\n"))
for i in range(beginNum,3111):
try:
    pageUrl = 'http://www.76xh.com/tupian/'+str(i)+'.html'
    htmlDoc = urllib.request.urlopen(pageUrl).read()
    soup = BeautifulSoup(htmlDoc)
    print ("正在下载第(%d)"%(i))
    divHtml = soup.find_all("div",class_="pic_text")
    imgUrl = 'http://www.76xh.com' + divHtml[0].img.attrs['src']
    data = urllib.request.urlopen(imgUrl).read()
    fileName = soup.title.contents[0] + '.jpg'
    filePath = os.path.join('C:/img',fileName)
    image = open(filePath,'wb')
    image.write(data)
    image.close()
    pass
 except Exception as e:
    continue
    raise
 else:
    pass
 finally:
    pass
print ('OK')

通过一个简单的continue语句将错误的下载直接跳过去这样不就是很好了吗？
运行
输入一个数字1111-3111之间，它会一直下载到3111这个网页的图片。大致估计一下，我下载了1700多张。

这个程序还是有很大的缺点的，下载速度太慢，那怎么办？我们下回见分晓

网友评论

无与童比:@思齐2016 图片本身就是带水印的资源，所以没什么办法可以统一消去水印。
思齐2016:你这采集的不错，就是带水印，能否找到不带水印的图片地址？

本文标题：python小爬虫抓取搞笑图片V2.0

本文链接：https://www.haomeiwen.com/subject/uagdtttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python小爬虫抓取搞笑图片V2.0

相关文章

python小爬虫抓取搞笑图片V2.0

python小爬虫抓取搞笑图片

python 爬虫抓取图片

python抓取图片爬虫

python爬虫抓取图片

贴吧帖子内图片抓取

python爬虫抓取视频保存到文件

scrapy抓取百度图片-写给自己看爬虫系列1

学会爬虫抓取竞争对手数据，《Python3网络爬虫开发实战》PD

【python】网络爬虫抓取图片

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Python爬虫学习

Python专题

程序员