美文网首页
scarpyd+scrapydweb

scarpyd+scrapydweb

作者: claylee | 来源:发表于2022-02-18 07:03 被阅读0次

安装

pip install scrapyd
pip install scrapydweb

配置 scrapyd

修改配置文件

以virtualenv目录为例,打开安装目录下default_scrapyd.conf文件进行编辑

> vim ~/.virtualenvs/scrapyd/lib/python3.6/site-packages/scrapyd/default_scrapyd.conf
> 
[scrapyd]
eggs_dir    = /home/scrapyd/eggs
logs_dir    = /home/scrapyd/logs
items_dir   = /home/scrapyd/items
jobs_to_keep = 5
dbs_dir     = /home/scrapyd/dbs
max_proc    = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port   = 6800
...
  • 需要远程访问,修改bind_address=0.0.0.0
  • eggs_dir/logs_dir/items_dir/dbs_dir 等目录改成工作目录
  • 设置 username 和password 不为空,来启用basicAuth。

我在virtualenv 里面安装scrapyd,启动后一直无法正确读取配置文件。重装scrapyd,并尝试删除其他路径的可能的配置文件后正常。

scrapyd 启动时尝试寻找的scrapyd.conf 路径包括:

  • /etc/scrapyd/scrapyd.conf
  • /etc/scrapyd/conf.d/*
  • scrapyd.conf
  • ~/.scrapyd.conf

运行scrapyd

> nohup scrapyd>spider.log &
<回车>

配置 scrapydweb

  • 设置 SCRAPYDWEB_BIND =0.0.0.0 来允许外网IP访问
  • 设置 ENABLE_AUTH = True 启用basicAuth
  • 设置 username 和password
############################## QUICK SETUP start ##############################
############################## 快速设置 开始 ###################################
# Setting SCRAPYDWEB_BIND to '0.0.0.0' or IP-OF-THE-CURRENT-HOST would make
# ScrapydWeb server visible externally; Otherwise, set it to '127.0.0.1'.
# The default is '0.0.0.0'.
SCRAPYDWEB_BIND = '0.0.0.0'
# Accept connections on the specified port, the default is 5000.
SCRAPYDWEB_PORT = 5000

# The default is False, set it to True to enable basic auth for the web UI.
ENABLE_AUTH = True
# In order to enable basic auth, both USERNAME and PASSWORD should be non-empty strings.
USERNAME = 'username-for-scrapydweb-login'
PASSWORD = 'xxxxxxxx'

绑定scrapyd

同一个配置文件,找到SCRAPYD_SERVERS 项目。添加scrapyd 运行的host,port,username,password。

# - the string format: username:password@ip:port#group
#   - The default port would be 6800 if not provided,
#   - Both basic auth and group are optional.
#   - e.g. '127.0.0.1:6800' or 'username:password@localhost:6801#group'
# - the tuple format: (username, password, ip, port, group)
#   - When the username, password, or group is too complicated (e.g. contains ':@#'),
#   - or if ScrapydWeb fails to parse the string format passed in,
#   - it's recommended to pass in a tuple of 5 elements.
#   - e.g. ('', '', '127.0.0.1', '6800', '') or ('username', 'password', 'localhost', '6801', 'group')
SCRAPYD_SERVERS = [
    #'127.0.0.1:6800',
    # 'username:password@localhost:6801#group',
    ('username', 'password', 'localhost', '6800', ''),
]

相关文章

  • scarpyd+scrapydweb

    安装 配置 scrapyd 修改配置文件 以virtualenv目录为例,打开安装目录下default_scrap...

网友评论

      本文标题:scarpyd+scrapydweb

      本文链接:https://www.haomeiwen.com/subject/elmelrtx.html