美文网首页
scarpyd+scrapydweb

scarpyd+scrapydweb

作者: claylee | 来源:发表于2022-02-18 07:03 被阅读0次

    安装

    pip install scrapyd
    pip install scrapydweb
    

    配置 scrapyd

    修改配置文件

    以virtualenv目录为例,打开安装目录下default_scrapyd.conf文件进行编辑

    > vim ~/.virtualenvs/scrapyd/lib/python3.6/site-packages/scrapyd/default_scrapyd.conf
    > 
    [scrapyd]
    eggs_dir    = /home/scrapyd/eggs
    logs_dir    = /home/scrapyd/logs
    items_dir   = /home/scrapyd/items
    jobs_to_keep = 5
    dbs_dir     = /home/scrapyd/dbs
    max_proc    = 0
    max_proc_per_cpu = 4
    finished_to_keep = 100
    poll_interval = 5.0
    bind_address = 0.0.0.0
    http_port   = 6800
    ...
    
    • 需要远程访问,修改bind_address=0.0.0.0
    • eggs_dir/logs_dir/items_dir/dbs_dir 等目录改成工作目录
    • 设置 username 和password 不为空,来启用basicAuth。

    我在virtualenv 里面安装scrapyd,启动后一直无法正确读取配置文件。重装scrapyd,并尝试删除其他路径的可能的配置文件后正常。

    scrapyd 启动时尝试寻找的scrapyd.conf 路径包括:

    • /etc/scrapyd/scrapyd.conf
    • /etc/scrapyd/conf.d/*
    • scrapyd.conf
    • ~/.scrapyd.conf

    运行scrapyd

    > nohup scrapyd>spider.log &
    <回车>
    

    配置 scrapydweb

    • 设置 SCRAPYDWEB_BIND =0.0.0.0 来允许外网IP访问
    • 设置 ENABLE_AUTH = True 启用basicAuth
    • 设置 username 和password
    ############################## QUICK SETUP start ##############################
    ############################## 快速设置 开始 ###################################
    # Setting SCRAPYDWEB_BIND to '0.0.0.0' or IP-OF-THE-CURRENT-HOST would make
    # ScrapydWeb server visible externally; Otherwise, set it to '127.0.0.1'.
    # The default is '0.0.0.0'.
    SCRAPYDWEB_BIND = '0.0.0.0'
    # Accept connections on the specified port, the default is 5000.
    SCRAPYDWEB_PORT = 5000
    
    # The default is False, set it to True to enable basic auth for the web UI.
    ENABLE_AUTH = True
    # In order to enable basic auth, both USERNAME and PASSWORD should be non-empty strings.
    USERNAME = 'username-for-scrapydweb-login'
    PASSWORD = 'xxxxxxxx'
    
    

    绑定scrapyd

    同一个配置文件,找到SCRAPYD_SERVERS 项目。添加scrapyd 运行的host,port,username,password。

    # - the string format: username:password@ip:port#group
    #   - The default port would be 6800 if not provided,
    #   - Both basic auth and group are optional.
    #   - e.g. '127.0.0.1:6800' or 'username:password@localhost:6801#group'
    # - the tuple format: (username, password, ip, port, group)
    #   - When the username, password, or group is too complicated (e.g. contains ':@#'),
    #   - or if ScrapydWeb fails to parse the string format passed in,
    #   - it's recommended to pass in a tuple of 5 elements.
    #   - e.g. ('', '', '127.0.0.1', '6800', '') or ('username', 'password', 'localhost', '6801', 'group')
    SCRAPYD_SERVERS = [
        #'127.0.0.1:6800',
        # 'username:password@localhost:6801#group',
        ('username', 'password', 'localhost', '6800', ''),
    ]
    

    相关文章

      网友评论

          本文标题:scarpyd+scrapydweb

          本文链接:https://www.haomeiwen.com/subject/elmelrtx.html