Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

作者: 千喜Ya | 来源:发表于2019-08-06 11:03 被阅读0次

Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分
Python爬虫-猫眼电影排行
【爬虫实战】Scrapy爬取猫眼电影
Python爬虫之Scrapy框架爬取XXXFM音频文件
爬虫练习_使用scrapy爬取淘宝
Python爬虫作业 | 爬取拉勾职位信息-Scrapy版
python爬虫框架Scrapy
爬虫学习(一)：利用requests爬取猫眼电影top100
Scrapy-Redis分布式爬取自如网（一）
Python-Scrapy-学习笔记（一）

item :

import scrapy

class MovieItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    score = scrapy.Field()

MaoyanSpider :

# -*- coding: utf-8 -*-
import scrapy
from demo1.items import MovieItem


class MaoyanSpider(scrapy.Spider):
    name = 'maoyan'
    allowed_domains = ['maoyan.com']
    start_urls = ['http://maoyan.com/films?offset=30']

    def parse(self, response):
        names = response.xpath('//div[@class="channel-detail movie-item-title"]/@title').extract()
        scores_div = [score.xpath('string(.)').extract_first() for score in  response.xpath('//div[@class="channel-detail channel-detail-orange"]')]

        scores = []
        # for score in scores_div:
        #     scores.append(score.xpath('string(.)').extract_first())

        # for name, score in zip(names, scores_div):
        #     # print(name, ':', score)
        #     yield {"name": name, "score": score}

        item = MovieItem()
        for name, score in zip(names, scores_div):
            item['name'] = name
            item['score'] = score
            yield item

            #yield只能返回字典与定义的item,pipeline接收到的也是对应的字典与item

pipeline :

import json


class Demo1Pipeline(object):
    def open_spider(self, spider):
        self.filename = open('movie.txt', 'w', encoding='utf-8')

    def process_item(self, item, spider):
        # with open('movie.txt', 'a', encoding='utf-8') as f:
        #     f.write(json.dumps(item, ensure_ascii=False) + '\n')
        # print(item)
        #序列化注意先将item转成字典
        self.filename.write(json.dumps(dict(item), ensure_ascii=False) + '\n')
        return item  #return是为了让其他pipeline也能用

    def close_spider(self, spider):
        self.filename.close()

setting :

ITEM_PIPELINES = {
   'demo1.pipelines.Demo1Pipeline': 300,  #pipelines的路径 : 300代表优先级顺序,越小启动级别越高
}

网友评论

本文标题：Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

本文链接：https://www.haomeiwen.com/subject/ryuvdctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

相关文章

Scrapy爬虫框架(二) ------ 爬取猫眼电影以及评分

Python爬虫-猫眼电影排行

【爬虫实战】Scrapy爬取猫眼电影

Python爬虫之Scrapy框架爬取XXXFM音频文件

爬虫练习_使用scrapy爬取淘宝

Python爬虫作业 | 爬取拉勾职位信息-Scrapy版

python爬虫框架Scrapy

爬虫学习(一)：利用requests爬取猫眼电影top100

Scrapy-Redis分布式爬取自如网（一）

Python-Scrapy-学习笔记（一）

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读