安装
pip install scrapyd
pip install scrapydweb
配置 scrapyd
修改配置文件
以virtualenv目录为例,打开安装目录下default_scrapyd.conf文件进行编辑
> vim ~/.virtualenvs/scrapyd/lib/python3.6/site-packages/scrapyd/default_scrapyd.conf
>
[scrapyd]
eggs_dir = /home/scrapyd/eggs
logs_dir = /home/scrapyd/logs
items_dir = /home/scrapyd/items
jobs_to_keep = 5
dbs_dir = /home/scrapyd/dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
...
- 需要远程访问,修改bind_address=0.0.0.0
- eggs_dir/logs_dir/items_dir/dbs_dir 等目录改成工作目录
- 设置 username 和password 不为空,来启用basicAuth。
我在virtualenv 里面安装scrapyd,启动后一直无法正确读取配置文件。重装scrapyd,并尝试删除其他路径的可能的配置文件后正常。
scrapyd 启动时尝试寻找的scrapyd.conf 路径包括:
- /etc/scrapyd/scrapyd.conf
- /etc/scrapyd/conf.d/*
- scrapyd.conf
- ~/.scrapyd.conf
运行scrapyd
> nohup scrapyd>spider.log &
<回车>
配置 scrapydweb
- 设置 SCRAPYDWEB_BIND =0.0.0.0 来允许外网IP访问
- 设置 ENABLE_AUTH = True 启用basicAuth
- 设置 username 和password
############################## QUICK SETUP start ##############################
############################## 快速设置 开始 ###################################
# Setting SCRAPYDWEB_BIND to '0.0.0.0' or IP-OF-THE-CURRENT-HOST would make
# ScrapydWeb server visible externally; Otherwise, set it to '127.0.0.1'.
# The default is '0.0.0.0'.
SCRAPYDWEB_BIND = '0.0.0.0'
# Accept connections on the specified port, the default is 5000.
SCRAPYDWEB_PORT = 5000
# The default is False, set it to True to enable basic auth for the web UI.
ENABLE_AUTH = True
# In order to enable basic auth, both USERNAME and PASSWORD should be non-empty strings.
USERNAME = 'username-for-scrapydweb-login'
PASSWORD = 'xxxxxxxx'
绑定scrapyd
同一个配置文件,找到SCRAPYD_SERVERS 项目。添加scrapyd 运行的host,port,username,password。
# - the string format: username:password@ip:port#group
# - The default port would be 6800 if not provided,
# - Both basic auth and group are optional.
# - e.g. '127.0.0.1:6800' or 'username:password@localhost:6801#group'
# - the tuple format: (username, password, ip, port, group)
# - When the username, password, or group is too complicated (e.g. contains ':@#'),
# - or if ScrapydWeb fails to parse the string format passed in,
# - it's recommended to pass in a tuple of 5 elements.
# - e.g. ('', '', '127.0.0.1', '6800', '') or ('username', 'password', 'localhost', '6801', 'group')
SCRAPYD_SERVERS = [
#'127.0.0.1:6800',
# 'username:password@localhost:6801#group',
('username', 'password', 'localhost', '6800', ''),
]
网友评论