python爬虫实战之美女图

作者: lixuxian | 来源:发表于2016-06-20 22:57 被阅读992次

Python爬虫实战之爬取链家广州房价_03存储
Python网络爬虫实战之十四：Scrapy结合scrapy-s
Python网络爬虫实战之七：动态网页爬取案例实战 Seleni
Python网络爬虫实战之八：动态网页爬取案例实战 Seleni
Python网络爬虫实战之九：Selenium进阶操作与爬取京东
Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍
Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集
Python网络爬虫实战之六：静态网页爬取案例实战
Python网络爬虫实战之二：环境部署、基础语法、文件操作
Python网络爬虫实战之一：网络爬虫理论基础

最近学习python爬虫，写了一个简单的递归爬虫下载美女图片的程序。废话不多说，先上图：

捕获.JPG

2.JPG

一共是三千多张美图哦：）
python版本为3.5，使用urllib和urllib.request访问网页，用BeautifulSoup解析获取到的html，找到主页面中的图片链接和新的页面的链接，下载完图片后，依次访问新的链接，进行递归爬虫，直到递归到最深层。其中集合set存放已爬过的页面，以免访问到相同的页面。源码如下：
import urllib
import urllib.request
import re
import time
from threading import *
from bs4 import BeautifulSoup

screenLock = Semaphore(value=1)
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
main_url = 'http://www.chunmm.com'
num = 1
pages = set()
pages.add(main_url)

def downloadimg(url, depth):
if depth != 0:
print(depth)
print(url)
req = urllib.request.Request(url, headers=headers)
html = urllib.request.urlopen(req).read().decode('utf-8')
soup = BeautifulSoup(html, 'html.parser')
imgurllist = soup.find_all('img', {'src': re.compile(r'http://.+.jpg')})
urllist = soup.find_all('a', {'href': re.compile(r'/.+?/.+?.html')})
local_path = 'd:/OOXXimg/'
global num
for item in imgurllist:
print(item["src"])
url = item["src"]
path = local_path + str(num) + '.jpg'
urllib.request.urlretrieve(url, path)
num += 1
screenLock.acquire()
print(str(num)+' img was downloaded\n')
screenLock.release()

    for url in urllist:
        if url not in pages:
            global main_url
            newurl = main_url+url["href"]
            downloadimg(newurl, depth-1)
            pages.add(url)
            time.sleep(1)
else:
    return

def main():
downloadimg(main_url, 3)

if name == 'main':
main()

注意：最好在访问页面时加上异常处理，以免访问页面时url出现异常导致程序退出。该实例程序递归层次为3层，共下载3000多张图片。
多谢阅读！

Python爬虫实战之爬取链家广州房价_03存储
问题引入系列目录： Python爬虫实战之爬取链家广州房价_01简单的单页爬虫 Python爬虫实战之爬取链家广...
Python网络爬虫实战之十四：Scrapy结合scrapy-s
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之七：动态网页爬取案例实战 Seleni
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之八：动态网页爬取案例实战 Seleni
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之九：Selenium进阶操作与爬取京东
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之六：静态网页爬取案例实战
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之二：环境部署、基础语法、文件操作
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之一：网络爬虫理论基础
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...

python爬虫实战之美女图

相关文章

Python爬虫实战之爬取链家广州房价_03存储

Python网络爬虫实战之十四：Scrapy结合scrapy-s

Python网络爬虫实战之七：动态网页爬取案例实战 Seleni

Python网络爬虫实战之八：动态网页爬取案例实战 Seleni

Python网络爬虫实战之九：Selenium进阶操作与爬取京东

Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍

Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集

Python网络爬虫实战之六：静态网页爬取案例实战

Python网络爬虫实战之二：环境部署、基础语法、文件操作

Python网络爬虫实战之一：网络爬虫理论基础

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

人生苦短，我用python_博客已迁移

笨办法学Python

码农的日常之Python开发

python

璃木python爬虫学习

python爬虫

iOS开发者时报