美文网首页Python
python scrapy 教程 示范案例- 抓取图文信息

python scrapy 教程 示范案例- 抓取图文信息

作者: shellinglin | 来源:发表于2019-06-29 00:05 被阅读0次

    更多干活分享可访问博主个人网站
    https://www.fzg5.com/blog/

    scrapy_projects

    可以作为 scrapy 学习项目

    项目一

    爬取一生必须知道的50幅中国名画,每一幅你都不容错过 这篇文章中的50幅名画

    1. items创建
    painter = scrapy.Field()
    pic_name = scrapy.Field()
    picture = scrapy.Field()
    
    1. scrapy配置
    allowed_domains = ['sohu.com']
        start_urls = ['http://www.sohu.com/a/157709282_661623']
    
        def parse(self, response):
            pic_list = response.xpath('//article[@class="article"]/p')
            items = []
            for pic in pic_list[2:]:
                if len(pic.extract().split('/'))>1: 
                    item = FamouspicspiderItem()
                    item['painter'] = pic.xpath('span/text()')[0].extract().split('/')[1]
                    item['pic_name'] = pic.extract().split('/')[0].split('、')[1]
                    items.append(item)
                if pic.xpath('img/@src').extract(): 
                    items[-1]['picture'] = pic.xpath('img/@src').extract()[0]
                
            return items
    
    1. pipelines自定义存储
    with open(picPath, 'wb') as fp:
        response = urlopen(item['picture'])
        fp.write(response.read())
    

    4.修改配置文件,注册自定义存储文件

    # Configure item pipelines
    # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
    ITEM_PIPELINES = {
       'famousPicSpider.pipelines.FamouspicspiderPipeline': 300,
    }
    

    5、scrapy crawl famousPic

    GitHub

    https://github.com/shellingshord/scrapy_projects#%E9%A1%B9%E7%9B%AE%E4%B8%80

    更多干活分享可访问博主个人网站
    https://www.fzg5.com/blog/

    相关文章

      网友评论

        本文标题:python scrapy 教程 示范案例- 抓取图文信息

        本文链接:https://www.haomeiwen.com/subject/cvzycctx.html