美文网首页
scrapy爬虫部署(centos7)(含scrapy_spla

scrapy爬虫部署(centos7)(含scrapy_spla

作者: _好孩子 | 来源:发表于2019-03-22 14:00 被阅读0次

    1.配置好python环境,详情见《python3安装(centos)》

    2.安装docker:

    yum install -y docker

    3.配置国内镜像源:

    进入docker安装目录(默认为/etc/docker/),vim目录下的daemon.json:

    vim /etc/docker/daemon.json

    写入以下内容:

    {

    "registry-mirrors": [

    "https://kfwkfulq.mirror.aliyuncs.com",

    "https://2lqq34jg.mirror.aliyuncs.com",

    "https://pee6w651.mirror.aliyuncs.com",

    "https://registry.docker-cn.com",

    "http://hub-mirror.c.163.com"

    ],

    "dns": ["8.8.8.8","8.8.4.4"]

    }

    3.启动docker:

    systemctl start docker

    4.拉取splash镜像:

    docker pull scrapinghub/splash

    5.运行splash:

    docker run -d -p 8050:8050 scrapinghub/splash

    (如果是阿里云服务器,注意安全组配置)

    6.设置mysql,以免报too many connection(如果之前设置过,忽略。详情见《解决Mysql错误Too many connections的方法》)

    vim /etc/my.conf

    在[mysqld]下添加:

    max_connections=1000

    wait_timeout=100

    interactive_timeout=100

    max_allowed_packet=15M

    7.安装scrapyd

    pip install scrapyd --upgrade

    8.设置软链接

    ln -s /usr/local/python3/bin/scrapy /usr/bin/scrapy

    ln -s /usr/local/python3/bin/scrapyd /usr/bin/scrapyd

    ln -s /usr/local/python3/bin/twist /usr/bin/twist

    ln -s /usr/local/python3/bin/twistd /usr/bin/twistd

    9.设置scrapyd配置文件,允许远程链接:

    vim /usr/local/python3/lib/python3.6/site-packages/scrapyd/default_scrapyd.conf

    修改:

    bind_address = 0.0.0.0

    10.启动scrapyd(端口6800,注意配置阿里云安全组。)

    nohup scrapyd &

    11.创建文件夹用来存储scrapy spider的日志(在spider的setting.py中配置该地址)

    mkdir /var/log/spider/log


    (以下操作在本地电脑上进行)

    12.安装scrapy-client

    pip install scrapyd-client

    13.配置scrapy.cfg,设置远程scrapyd地址;

    14.在爬虫的setting.py配置文件中配置sys_evn,LOG_FILE_DIR(setting.py中的LOG_FILE变量必须写死,地址为12步创建的路径)

    15.在scrapy.cfg同目录下执行scrapyd-deploy命令,

    scrapyd-deploy

    如果报命令找不到,在相应环境的Script/下新增scrapyd-deploy.bat文件,内容为:

      @echo off

    "D:\anaconda3-5.0.1\envs\py36\python.exe" "D:\anaconda3-5.0.1\envs\py36\Scripts\scrapyd-deploy" %1 %2 %3 %4 %5 %6 %7 %8 %9

    (具体路径根据实际情况来)

    16.用curl 命令调用scrapyd来启动和停止spider

    ---------------------

    相关文章

      网友评论

          本文标题:scrapy爬虫部署(centos7)(含scrapy_spla

          本文链接:https://www.haomeiwen.com/subject/yzlppqtx.html