Scrapy 抓取图片

作者: whong736 | 来源:发表于2018-03-13 21:44 被阅读89次

    目标:抓取图片网站 http://hunter-its.com上的图片

    1.建立项目 beauty

    scrapy startproject beauty
    
    

    2.cd到目录,并新建爬虫,使用基础模板

    cd beauty
    
    scrapy genspider hunter hunter-its.com
    
    
    image.png

    3.pycharm打开项目,先编写item

    打开item.py文件,定义名字和地址

    import scrapy
    
    class BeautyItem(scrapy.Item):
    
        name = scrapy.Field()
        address = scrapy.Field()
    
    
    image.png

    4.编写spider,爬虫文件

    导入之前定义的BeautyItem模块,和Request模块

    from beauty.items import BeautyItem
    from scrapy.http import Request
    
    

    使用xpath获取全部的图片节点
    pics = response.xpath('//div[@class="pic"]/ul/li')
    循环获取li节点中的所有图片和地址

            for pic in pics:
                item = BeautyItem()
                name = pic.xpath('./a/img/@alt').extract()[0]
                address = pic.xpath('./a/img/@src').extract()[0]
    
                item['name'] = name
                item['address'] = address
    
                yield item
    

    递归调用函数,爬取多页数据

                for i in range(2, 8):
                    url = 'http://hunter-its.com/m/'+str(i)+'.html'
                    print(url)
                    yield Request(url, callback=self.parse)
    

    完整代码

    # -*- coding: utf-8 -*-
    import scrapy
    from beauty.items import BeautyItem
    from scrapy.http import Request
    
    
    class HunterSpider(scrapy.Spider):
        name = 'hunter'
        allowed_domains = ['hunter-its.com']
        start_urls = ['http://hunter-its.com/m/1.html']
    
        def parse(self, response):
            #获取全部的图片节点
            pics = response.xpath('//div[@class="pic"]/ul/li')
    
            for pic in pics:
                item = BeautyItem()
                name = pic.xpath('./a/img/@alt').extract()[0]
                address = pic.xpath('./a/img/@src').extract()[0]
    
                item['name'] = name
                item['address'] = address
    
                yield item
    
                for i in range(2, 8):
                    url = 'http://hunter-its.com/m/'+str(i)+'.html'
                    print(url)
                    yield Request(url, callback=self.parse)
    
    
    image.png

    5.编写数据处理脚本pipelines.py,导入requests模块

    import requests
    
    class BeautyPipeline(object):
        def process_item(self, item, spider):
    
            #模拟浏览器
            headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
            #使用request模块,发送get请求
            r = requests.get(url=item['address'], headers=headers, timeout=4)
    
            print(item['address'])
            #下载图片,存储在本地文件目录下
            with open(r'/Users/vincentwen/Downloads/hunter/'+ item['name'] + '.jpg', 'wb') as f:
                f.write(r.content)
    
    
    image.png

    6.修改setting ITEM_PIPELINES

    ITEM_PIPELINES = {
       'beauty.pipelines.BeautyPipeline': 100,
    }
    
    image.png

    7.运行爬虫

    scrapy crawl hunter 
    
    image.png image.png

    觉得文章有用,请用支付宝扫描,领取一下红包!打赏一下

    支付宝红包码

    相关文章

      网友评论

        本文标题:Scrapy 抓取图片

        本文链接:https://www.haomeiwen.com/subject/lwuafftx.html