美文网首页
Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

作者: 千喜Ya | 来源:发表于2019-08-06 11:03 被阅读0次

item :

import scrapy

class MovieItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    score = scrapy.Field()

MaoyanSpider :

# -*- coding: utf-8 -*-
import scrapy
from demo1.items import MovieItem


class MaoyanSpider(scrapy.Spider):
    name = 'maoyan'
    allowed_domains = ['maoyan.com']
    start_urls = ['http://maoyan.com/films?offset=30']

    def parse(self, response):
        names = response.xpath('//div[@class="channel-detail movie-item-title"]/@title').extract()
        scores_div = [score.xpath('string(.)').extract_first() for score in  response.xpath('//div[@class="channel-detail channel-detail-orange"]')]

        scores = []
        # for score in scores_div:
        #     scores.append(score.xpath('string(.)').extract_first())

        # for name, score in zip(names, scores_div):
        #     # print(name, ':', score)
        #     yield {"name": name, "score": score}

        item = MovieItem()
        for name, score in zip(names, scores_div):
            item['name'] = name
            item['score'] = score
            yield item

            #yield只能返回字典与定义的item,pipeline接收到的也是对应的字典与item

pipeline :

import json


class Demo1Pipeline(object):
    def open_spider(self, spider):
        self.filename = open('movie.txt', 'w', encoding='utf-8')

    def process_item(self, item, spider):
        # with open('movie.txt', 'a', encoding='utf-8') as f:
        #     f.write(json.dumps(item, ensure_ascii=False) + '\n')
        # print(item)
        #序列化注意先将item转成字典
        self.filename.write(json.dumps(dict(item), ensure_ascii=False) + '\n')
        return item  #return是为了让其他pipeline也能用

    def close_spider(self, spider):
        self.filename.close()

setting :

ITEM_PIPELINES = {
   'demo1.pipelines.Demo1Pipeline': 300,  #pipelines的路径 : 300代表优先级顺序,越小启动级别越高
}

相关文章

网友评论

      本文标题:Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

      本文链接:https://www.haomeiwen.com/subject/ryuvdctx.html