2019-01-14 图片爬取

作者: 化石0305 | 来源:发表于2019-01-14 03:16 被阅读0次

2019-01-14 图片爬取
python-爬虫学习（文字、图片、视频）
python爬虫学习（文字、图片、视频）
爬百度图片
Python·爬取当当网图书信息
爬取图片
听说你想学习爬虫？送你一本葵花宝典！不用自宫！高效的学习路径
六. 项目实战：下载360图片
Python 爬虫实战计划：第一周第四节作业
BeautifulSoup4简单爬取图片并存放

import requests
import os
from lxml import etree


class spider(object):
    def __init__(self):
        self.headers = {
            "user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36",
            "referer": "https://www.mzitu.com/"
        }

        # 1.取得网站数据
    def requsts_dada(self):
        response = requests.get("https://www.mzitu.com/",headers = self.headers)
        html = etree.HTML(response.text)
        # 2.获取大链接地址和分类标题
        class_tit = html.xpath('//ul[@id="pins"]/li/span/a/text()')
        class_href = html.xpath('//ul[@id="pins"]/li/span/a/@href')
        # print(class_href)
        # 建立文件夹
        for tit,src in zip(class_tit,class_href):
            if os.path.exists(tit) == False:
                 os.mkdir(tit)
                 self.download_img_data(src,tit)
    def download_img_data(self,src,tit):
        # 3.取得分类页面数据
        response = requests.get(src,headers = self.headers)
        html = etree.HTML(response.text)
        img_num = html.xpath('//div[@class="pagenavi"]/a[5]/span/text()')
        for i in range(1,int(img_num[0])+1):
            # 4.获取分类页面大图标题及大图链接
            img_tit = html.xpath('//h2/text()')
            img_data = requests.get(src + "/" + str(i),headers = self.headers)
            html = etree.HTML(img_data.text)
            img_href = html.xpath('//div[@class ="main-image"]/p/a/img/@src')
            for imgtit,imgsrc in zip(img_tit,img_href):
                jpg_name = tit + "\\" + tit + str(i) + ".jpg"
                response = requests.get(imgsrc,headers = self.headers).content
                print("正在下载图片……")
                # 5.保存图片
                with open(jpg_name,"wb") as f:
                    f.write(response)


spider = spider()
spider.requsts_dada()

网友评论

本文标题：2019-01-14 图片爬取

本文链接：https://www.haomeiwen.com/subject/sbbrdqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2019-01-14 图片爬取

相关文章

2019-01-14 图片爬取

python-爬虫学习（文字、图片、视频）

python爬虫学习（文字、图片、视频）

爬百度图片

Python·爬取当当网图书信息

爬取图片

听说你想学习爬虫？送你一本葵花宝典！不用自宫！高效的学习路径

六. 项目实战：下载360图片

Python 爬虫实战计划：第一周第四节作业

BeautifulSoup4简单爬取图片并存放

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读