美文网首页
关于Scrapy中ItemProcess的process_ite

关于Scrapy中ItemProcess的process_ite

作者: 单名一个冲 | 来源:发表于2019-05-29 21:41 被阅读0次
  1. 检查settings.py中ITEM_PIPELINES是否指定Item管道,例如:
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'worm.pipelines.WormPipeline': 100,
}
  1. 如果Item实现了子类的构造,则父类必须显示声明父类构造:
# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy
from scrapy import Field

class TestSpiderItem(scrapy.Item):

    def __init__(self):
        # 如果实现了子类的构造,则必须声明父类构造,
        # 否则无法执行ItemProcess的process_item方法
        super().__init__()
        print('<INFO> TestSpiderItem is instancing.')

    name = Field()
  1. 检查process_item(self, item, spider)方法是否返回一个item或dict对象:
class WormPipeline(object):
    # This method is called for every item pipeline component.
    # process_item() must either: return a dict with data,
    # return an Item (or any descendant class) object,
    # return a Twisted Deferred or raise DropItem exception.
    # Dropped items are no longer processed by further pipeline components.
    def process_item(self, item, spider):
        with open('F:\\text1.txt', 'a') as f:
            f.write(item['author'] + '\n')
        return item

相关文章

网友评论

      本文标题:关于Scrapy中ItemProcess的process_ite

      本文链接:https://www.haomeiwen.com/subject/jgwutctx.html