【Python爬虫实战】为何如此痴迷Python？还不是因为爱看

【Python爬虫实战】为何如此痴迷Python？还不是因为爱看

作者: 悦悦学Python | 来源:发表于2021-08-05 14:10 被阅读0次

【Python爬虫实战】为何如此痴迷Python？还不是因为爱看
Python网络爬虫实战之十四：Scrapy结合scrapy-s
Python网络爬虫实战之七：动态网页爬取案例实战 Seleni
Python网络爬虫实战之八：动态网页爬取案例实战 Seleni
Python网络爬虫实战之九：Selenium进阶操作与爬取京东
Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍
Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集
Python网络爬虫实战之六：静态网页爬取案例实战
Python网络爬虫实战之二：环境部署、基础语法、文件操作
Python网络爬虫实战之一：网络爬虫理论基础

爬取目标

网址:绝对领域

工具使用

开发环境：win10、python3.7

开发工具：pycharm、Chrome

工具包：requests，lxml

项目思路解析

选取你对应的图片分类

根据分类信息提取到没有图片的超链接，提取出A标签的跳转地址以及图片的标题名字

def get_url(start_url):

response = requests.get(start_url, headers=headers).text

data = etree.HTML(response)

new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href')

for url in new_url:

yield url

进入详情页面，xpath提取详情页面所有的图片地址：

发送图片数据请求，保存对应图片数据信息

简易源码分享：

import requests

from lxml import etree

headers = {

"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)

AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88

Safari/537.36"

}

def get_url(start_url):

response = requests.get(start_url, headers=headers).text

data = etree.HTML(response)

new_url = data.xpath('//div[@class="post-module-thumb"]/a/@href')

for url in new_url:

yield url

def get_img(url):

response = requests.get(url, headers=headers).text

img_data = etree.HTML(response)

img_url = img_data.xpath('//div[@class="entry-content"]/img/@src')

for img_url in img_url:

name = img_url.split("/")[-2] + img_url.split("/")[-1]

result = requests.get(img_url).content

with open("图片/" + name, "wb")as f:

f.write(result)

print("正在下载", name)

if __name__ == '__main__':

for i in range(1, 3):

start_url = "https://www.jdlingyu.com/tuji/hentai/gctt/page/{}".format(i)

html_url = get_url(start_url)

for url in html_url:

get_img(url)

总结

我是悦悦，一名喜欢分享知识的程序媛，感兴趣的就赶紧来点击关注我叭~哪里有不明白或有不同观点的地方欢迎留言！

相关文章

【Python爬虫实战】为何如此痴迷Python？还不是因为爱看
爬取目标网址:绝对领域[https://www.jdlingyu.com/tuji/hentai/gctt/pa...
Python网络爬虫实战之十四：Scrapy结合scrapy-s
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之七：动态网页爬取案例实战 Seleni
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之八：动态网页爬取案例实战 Seleni
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之九：Selenium进阶操作与爬取京东
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之十一：Scrapy爬虫框架入门介绍
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之十三：Scrapy爬取名侦探柯南漫画集
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之六：静态网页爬取案例实战
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之二：环境部署、基础语法、文件操作
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...
Python网络爬虫实战之一：网络爬虫理论基础
目录：Python网络爬虫实战系列 Python网络爬虫实战之一：网络爬虫理论基础 Python网络爬虫实战之二：...

网友评论

本文标题：【Python爬虫实战】为何如此痴迷Python？还不是因为爱看

本文链接：https://www.haomeiwen.com/subject/avfevltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|【Python爬虫实战】为何如此痴迷Python？还不是因为爱看|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！