美文网首页
Scrapy抓取优美图片

Scrapy抓取优美图片

作者: 我的袜子都是洞 | 来源:发表于2018-11-12 23:50 被阅读16次

    目标网站:http://www.umei.cc

    爬取成果

    爬虫代码:

    import scrapy
    import urllib
    from douban_movie.items import DoubanMovieItem
    
    class MovieSpider(scrapy.Spider):
        # 爬虫名
        name = 'meinv'
        # 起始url
        start_urls = [
            'http://www.umei.cc/',
        ]
    
        def parse(self, response):
            # 水平抓取页面
            urls = response.xpath("//div[@class='PicListTxt']/ul/li/a/@href").extract()
            for url in urls :
                yield scrapy.Request(url,callback = self.parse_item)
        # 处理每个美女详情页照片    
        def parse_item(self,response):
            next_pic = response.xpath("//div[@class='NewPages']/ul/li/a[contains(text(),'下一页')]/@href").extract_first()
            if next_pic != '#':
                url = 'http://www.umei.cc/p/gaoqing/cn/'+next_pic
                yield scrapy.Request(url,callback=self.parse_item)
            item = DoubanMovieItem()
            item['name'] = response.xpath("//div[@class='ArticleTitle']/strong/text()").extract_first()
            item['imgurl'] = response.xpath("//div[@class='ImageBody']//p/a/img/@src").extract_first()
            yield item
    

    管道代码:

    from scrapy.http import Request
    import os 
    import urllib
    
    class DoubanMoviePipeline(object):
       def process_item(self,item,info):
           conn = urllib.request.urlopen(item['imgurl'])
           with open("download/"+ item['name'] + '.jpeg','wb') as file:
               file.write(conn.read())
               file.close()
    

    配置文件:

    USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
    DEFAULT_REQUEST_HEADERS = {
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'en',
    }
    ITEM_PIPELINES = {
       'douban_movie.pipelines.DoubanMoviePipeline': 300,
    }
    
    运行界面

    相关文章

      网友评论

          本文标题:Scrapy抓取优美图片

          本文链接:https://www.haomeiwen.com/subject/axghfqtx.html