高质量“爬手”当然得爬一手“高质量”壁纸

作者: 查理不是猹 | 来源:发表于2021-12-23 09:44 被阅读0次

高质量“爬手”当然得爬一手“高质量”壁纸
用python爬点高质量的壁纸换换，每天保持心情愉悦！
Python爬取高质量壁纸
写个知乎回答照片爬虫吧
python爬虫
继续爬取高清壁纸
壁纸爬取
安装就是VIP，完美解锁付费功能，超多小伙伴需要的神器！
Python爬虫学习
快来领取哔哩哔哩855张官方壁纸（2021年02月16日更新，附

一、写在前面

每天我的壁纸都是Windows自带的天蓝色，看的真的没意思，有意思吗，没意思~

所以啊，众所周知，我是一个喜欢高质量的博主，当然的整一手高质量壁纸，没有别的意思。

[图片上传失败...(image-9ec152-1640223848717)]
好了，不多哔哔，开启今天的高质量旅途~

二、准备工作

这些统统安排上

python 3.6  
pycharm
requests
parsel

三、爬虫流程

1）关于数据来源查找：

1、确定目标需求：爬取高清壁纸图片（彼岸）

通过开发者工具（F12或者鼠标右键点击检查）查找图片的url地址来源；
请求壁纸的详情页获取它网页源代码就可以获取图片url地址了（一张）；
请求列表页就可以获取每个壁纸的详情页url 以及标题；

2）代码实现：

1、发送请求

壁纸的列表页url： http://www.netbian.com/1920x1080/index.htm

2、获取数据

网页源代码/ response.text 网页文本数据

3、解析数据

css xpath bs4 re
壁纸详情页url：/desk/23397.htm 2.壁纸标题

4、保存数据

保存图片是二进制数据

观众姥爷：就这就这？代码呢？代码都不放你几个意思？

别慌，来了来了

四、代码展示

我就不一一拆解了，注释加上第三步，相信聪明的你可以理解，实在不行最后我放视频讲解吧。

import requests # 请求模块 第三方模块 pip install requests
import parsel # 数据解析模块 第三方模块 pip install parsel
import time # 时间模块 内置模块

time_1 = time.time()
# 要什么用模块 首先要知道模块有什么用
for page in range(2, 12):
    print(f'====================正在爬取第{page}页的数据内容====================')
    url = f'http://www.netbian.com/1920x1080/index_{page}.htm'
    # 请求头： 把python代码伪装成浏览器对服务器发送请求
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    # 出现乱码怎么办？ 需要转码
    # html_data = response.content.decode('gbk')
    response.encoding = response.apparent_encoding # 自动转码
    # 获取源代码/获取网页文本数据 response.text
    # print(response.text)
    # 解析数据
    selector = parsel.Selector(response.text)
    # CSS选择器 就是根据网页标签内容提取数据
    # 第一次提取 提取所有的li标签内容
    lis = selector.css('.list li')
    for li in lis:
        # http://www.netbian.com/desk/23397.htm
        title = li.css('b::text').get()
        if title:
            href = 'http://www.netbian.com' + li.css('a::attr(href)').get()
            response_1 = requests.get(url=href, headers=headers)
            selector_1 = parsel.Selector(response_1.text)
            img_url = selector_1.css('.pic img::attr(src)').get()

            img_content = requests.get(url=img_url, headers=headers).content
            with open('img\\' + title + '.jpg', mode='wb') as f:
                f.write(img_content)
                print('正在保存: ', title)

time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总计耗时{use_time}秒')

大家可以自己运行试试，记得三连哇

网友评论

本文标题：高质量“爬手”当然得爬一手“高质量”壁纸

本文链接：https://www.haomeiwen.com/subject/sphhqrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

高质量“爬手”当然得爬一手“高质量”壁纸

一、写在前面

二、准备工作

三、爬虫流程