美文网首页
关于Scrapy: 如何分别从Spider、Middleware

关于Scrapy: 如何分别从Spider、Middleware

作者: ArtioL | 来源:发表于2019-06-17 13:47 被阅读0次

settings的优先级
官方文档中scrapy中settings参数有四个级别:

命令行选项(Command line Options)(最高优先级)
项目设定模块(Project settings module)
命令默认设定模块(Default settings per-command)
全局默认设定(Default global settings) (最低优先级)

Spider
在parse()函数中获取settings

def parse(self, response):
    print(self.settings.get('CONFIG_KEY'))

在实例化spider时获取settings:

class MySpider(scrapy.Spider):
    def __init__(self, settings, *args, **kwargs):
        super(MySpider, self).__init__( *args, **kwargs)
        print(settings.get('CONFIG_KEY'))
    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = cls(crawler.settings, *args, **kwargs)
        spider._set_crawler(crawler)
        return spider

Middleware & Pipeline
通过处理方法里传入的spider参数 获取:
比如Middleware中的process_spider_input方法:

def process_spider_input(response, spider):
    print(spider.settings.get('CONFIG_KEY'))

在实例化时获取settings:

class MyMiddleware:
    def __init__(self, settings):
        print(setting.get('CONFIG_KEY'))
    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

一个清晰简单但有风险的方法:get_project_setting()
from scrapy.utils.project import get_project_settings

def parse(self, response):
    settings = get_project_settings()
    print(settings.get('CONFIG'))

pros: 简单明了
cons: 不能识别从command line 中传入的参数, command line 传入的参数具有最高优先级

转自https://blog.csdn.net/weixin_40841752/article/details/82900326

相关文章

网友评论

      本文标题:关于Scrapy: 如何分别从Spider、Middleware

      本文链接:https://www.haomeiwen.com/subject/rxdafctx.html