20行爬虫代码获取了20000张动图！来斗图呀

作者: 途途途途 | 来源:发表于2021-08-29 09:12 被阅读0次

20行爬虫代码获取了20000张动图！来斗图呀
秋招季，用Python分析深圳程序员工资有多高？
好看的 GIF 动图去哪儿找？
Python 爬取斗图啦图片
python爬取斗破苍穹小说
python爬虫每天自动背景图不香吗?
python爬虫——斗图网(跟我斗图，就问你怕不怕)
我的图会动：动图制作APP大全
多线程提提速吧
2.自定义控件之挖掘机工作装置姿态View

结果展示

图片质量 嘹咋咧！

网页分析

首先打开我们的目标网站，选择自己喜欢的风格和样式图片

网页F12打开浏览器开发者模式，找到如下

发送请求的真实链接：

数据是存储在一个json的数据集合里面的。

我们使用浏览器插件打开

我们要获取的图片信息都是在一个叫做nodes的列表里面。

老规矩，先获取json数据集。

发送请求

headers = {

'referer':'https://www.gaoding.com/templates/pn4-f1612599',

'user-agent': str(UserAgent().random)

}

url =f'https://www.gaoding.com/api/aggregate/search?q=&page_size=120&page_num={page}&design_cid=&channel_cid=&industry_cid=&filter_id=1612599&type_filter_id=1612599&channel_filter_id=&channel_children_filter_id=&sort=&styles=&colors=&ratios='

resp = requests.get(url, headers = headers)

print(resp.json())

成功获取到浏览器响应之后我们接下来获取每一张gif动图的链接

ifresp.status_code == requests.codes.ok:

pic_list = resp.json()['searchMaterials']['nodes']

foriteminpic_list:

urls = item['preview']['url']

print(urls)

'''

https://st0.dancf.com/csc/8/templets/92753/20190422-204832-12f4.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20200923-175642-6d7e.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210617-150001-d68f.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20200610-180443-a451.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20201026-111338-a51f.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210412-231317-fe5e.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20200408-190042-c176.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210401-173334-47a2.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210710-101533-5402.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210710-095505-b023.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210630-134525-e3ca.png

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210710-103608-d915.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210629-150025-1374.png

https://st-gdx.dancf.com/gaodingx/46/design/20190911-170537-3713.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210525-141136-6d24.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20201022-100011-7671.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210630-175847-f89d.png

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210710-102746-0f3e.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210708-165519-7324.png

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20200629-153449-454e.gif

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210706-172524-4448.png

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210629-152749-be41.png

https://st-gdx.dancf.com/gaodingx/0/uxms/design/20210630-142652-34a2.png

...............

'''

保存图片

接下来写一个保存下载动图的函数如下：

并且我们以uuid为生成唯一的名称

# 图片保存路径

pic_path ='./pictures'

defdown_pic(urls):

# 判断文件夹是否存在

ifnotos.path.exists(pic_path):

os.mkdir(pic_path)

else:

# 判断文件夹是否为空

shutil.rmtree(pic_path)

os.mkdir(pic_path)

count =1

forurlinurls:

r = requests.get(url, headers = headers)

try:

withopen(f'./pictures/{uuid.uuid4()}.gif','wb')asfin:

print(f'正在爬取第{count}张图片')

fin.write(r.content)

print(f'{uuid.uuid4()}.gif----下载成功')

except:

print('下载失败！')

count +=1