20 行 Python 代码批量抓取免费高清图片！

作者: 14e61d025165 | 来源:发表于2019-07-20 15:32 被阅读1次

20 行 Python 代码批量抓取免费高清图片！
20 行 Python 代码批量抓取免费高清图片！
Nodejs爬虫
《第一行代码--Android》PDF高清完整版-免费下载
《Python数据处理》（高清中文版PDF+源代码）免费下载
用Python爬虫批量下载百度图片
《Python深度学习》高清中文版pdf+高清英文版pdf+源代
《流畅的Python》高清中文版PDF+高清英文版PDF+源代码
python爬虫学习教程，短短25行代码批量下载豆瓣妹子图片
《Python机器学习实践指南》高清中文版PDF+高清英文版PD

前言

相信在你的工作中可能会经常用到PPT吧，你在PPT制作过程中有没有这样的困惑，就是可以到哪里找到既高清又无版权争议的图片素材呢？这里强烈推荐ColorHub，这是一个允许个人和商业用途的免费图片网站，真的很赞！从她的主页界面来看，也许你就会爱上她。

Python资源共享群：484031800

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929245 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

那么，如何将网站中的图片存储到本地呢（例如比较关心跟数据相关的素材）？如果做到了，就可以在没有网络的情况下，随心所欲的选择精美图片制作PPT，随时随地的查看自己的图片库。而本文所要跟大家分享的就是这个问题的解决方案。

爬虫思路

我们知道，对于图片网站的抓取，往往需要经过三层网页链接，为了能够直观地理解这三层链接，可以查看下图：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929248 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

顶层页：是指通过网站主页的搜索栏，搜索出感兴趣的图片方向，便进入到的图片列表页，它的样子是这样的：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929252 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

次层页：是指点击图片列表页中的某张图片，转而对应到的图片详情页，它的样子是这样的：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929256 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

目标页：最后就是为了抓取图片详情页中的那张高清图片，而这张图片在网页源代码中就是一个图片链接，它的样子是这样的：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929258 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

所以，爬虫抓取图片的最终目的就是找到高清图片所对应的链接。接下来将通过代码的介绍，呈现三层链接的寻找和请求过程。代码的每一行都将对应中文解释，如果还有其他疑问，可以在留言区留言，我会第一时间给你答复。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 导入第三方包
import
requests
from
bs4
import

BeautifulSoup
import
random
import
time
from
fake_useragent
import

UserAgent

通过循环实现多页图片的抓取

for
page
in
range(
1
,
11
):

生成顶层图片列表页的链接

fst_url = r

'https://colorhub.me/search?tag=data&page={}'
.format(page)

生成UA，用于爬虫请求头的设置

UA =

UserAgent
()

向顶层链接发送请求

fst_response = requests.get(fst_url, headers = {

'User-Agent'
:UA.random})

解析顶层链接的源代码

fst_soup =

BeautifulSoup
(fst_response.text)

根据HTML的标记规则，返回次层图片详情页的链接和图片名称

sec_urls = [i.find(

'a'
)[
'href'
]
for
i
in
fst_soup.findAll(name =
'div'
, attrs = {
'class'
:
'card'
})]
pic_names = [i.find(
'a'
)[
'title'
]
for
i
in
fst_soup.findAll(name =
'div'
, attrs = {
'class'
:
'card'
})]

对每一个次层链接做循环

for
sec_url,pic_name
in
zip(sec_urls,pic_names):

生成UA，用于爬虫请求头的设置

    UA =

UserAgent
()
ua = UA.random

向次层链接发送请求

    sec_response = requests.get(sec_url, headers = {

'User-Agent'
:ua})

解析次层链接的源代码

    sec_soup =

BeautifulSoup
(sec_response.text)

根据HTML的标记规则，返回图片链接

    pic_url =

'https:'

sec_soup.find(
'img'
,{
'class'
:
'card-img-top'
})[
'src'
]

对图片链接发送请求

    pic_response = requests.get(pic_url, headers = {

'User-Agent'
:ua})

将二进制的图片数据写入到本地（即存储图片到本地）

with
open(pic_name+
'.jpg'
, mode =
'wb'
)
as
fn:
fn.write(pic_response.content)

生成随机秒数，用于也没的停留

    seconds = random.uniform(

1
,
3
)
time.sleep(seconds)
</pre>

不难发现，代码的核心部分就16行，还是很简单的吧。还不赶快去测试一下这里的代码哦（如果你对某个方面感兴趣，如商务、建筑、植物等，通过搜索，找到顶层页链接，替换代码中的fst_url值即可）。

在运行完如上代码后，将会抓取ColorHub网站中的10页图片，一共包含325张高清图片，展示如下：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1563607929308 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">