关于Scrapy中ItemProcess的process_ite

作者: 单名一个冲 | 来源:发表于2019-05-29 21:41 被阅读0次

关于Scrapy中ItemProcess的process_ite
Python爬虫Scrapy框架：关于scrapy中scrapy
Scrapy学习问题解答之一
2020-03-30
scrapy相关专题总结
python爬虫使用scrapy框架爬取顶点小说网
scrapy中item的处理技巧
[scrapy]scrapy爬取京东商品信息——以自营手机为例
Scrapy 框架中的Request类（二十四）
Scrapy：reuqest.meta 的用法

检查settings.py中ITEM_PIPELINES是否指定Item管道，例如：

# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'worm.pipelines.WormPipeline': 100,
}

如果Item实现了子类的构造，则父类必须显示声明父类构造：

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy
from scrapy import Field

class TestSpiderItem(scrapy.Item):

    def __init__(self):
        # 如果实现了子类的构造，则必须声明父类构造，
        # 否则无法执行ItemProcess的process_item方法
        super().__init__()
        print('<INFO> TestSpiderItem is instancing.')

    name = Field()

检查process_item(self, item, spider)方法是否返回一个item或dict对象：

class WormPipeline(object):
    # This method is called for every item pipeline component.
    # process_item() must either: return a dict with data,
    # return an Item (or any descendant class) object,
    # return a Twisted Deferred or raise DropItem exception.
    # Dropped items are no longer processed by further pipeline components.
    def process_item(self, item, spider):
        with open('F:\\text1.txt', 'a') as f:
            f.write(item['author'] + '\n')
        return item