scrapy.cfg中的[settings]可以更改default=[projectname].settings的默认配置文件指向,若想要在项目运行时根据不同环境自动加载不同配置,需要做一些环境配置:
[settings]
default = cprice.settings
prd = cprice.settings
撸下scrapy配置相关源码:
def init_env(project='default', set_syspath=True):
"""Initialize environment to use command-line tool from inside a project
dir. This sets the Scrapy settings module and modifies the Python path to
be able to locate the project module.
"""
cfg = get_config()
if cfg.has_option('settings', project):
os.environ['SCRAPY_SETTINGS_MODULE'] = cfg.get('settings', project)
closest = closest_scrapy_cfg()
if closest:
projdir = os.path.dirname(closest)
if set_syspath and projdir not in sys.path:
sys.path.append(projdir)
def get_project_settings():
if ENVVAR not in os.environ:
project = os.environ.get('SCRAPY_PROJECT', 'default')
init_env(project)
settings = Settings()
settings_module_path = os.environ.get(ENVVAR)
if settings_module_path:
settings.setmodule(settings_module_path, priority='project')
# XXX: remove this hack
pickled_settings = os.environ.get("SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE")
if pickled_settings:
settings.setdict(pickle.loads(pickled_settings), priority='project')
# XXX: deprecate and remove this functionality
env_overrides = {k[7:]: v for k, v in os.environ.items() if
k.startswith('SCRAPY_')}
if env_overrides:
settings.setdict(env_overrides, priority='project')
return settings
可知当环境变量中无:SCRAPY_PROJECT/SCRAPY_SETTINGS_MODULE时,会默认读取default settings
分别尝试设置SCRAPY_PROJECT、SCRAPY_SETTINGS_MODULE参数:
export SCRAPY_PROJECT=xxxx时,xxxx是scrapy.cfg[settings]中定义的key;
export SCRAPY_SETTINGS_MODULE设置起来比较复杂,感兴趣的可以去试下。
=============================================================
因此想做参数根据环境自动选择的需要三步操作:
- 新建settings.py文件,例如prd_settings.py;
- scrapy.cfg→[settings]中添加新增的settings.py文件名,例如:prd=[projectname].prd_settings;
- export SCRAPY_PROJECT=prd;scrapy crawl spidername。
即可
网友评论