美文网首页
python scrapy爬虫框架

python scrapy爬虫框架

作者: proud2008 | 来源:发表于2018-02-28 12:18 被阅读15次

[基于scrapyd爬虫发布总结]

参考
pip 安装 scrapyd,Scrapyd-client两个工具

1、运行服务端
 PS C:\WINDOWS\system32> scrapyd
2018-03-01T15:35:58+0800 [-] Loading c:\users\administrator\appdata\local\programs\python\python36-32\lib\site-packages\scrapyd\txapp
.py...
2018-03-01T15:35:59+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2018-03-01T15:35:59+0800 [-] Loaded.
2018-03-01T15:35:59+0800 [twisted.application.app.AppLogger#info] twistd 17.9.0 (c:\users\administrator\appdata\local\programs\python
\python36-32\python.exe 3.6.4) starting up.
2018-03-01T15:35:59+0800 [twisted.application.app.AppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor.
2018-03-01T15:35:59+0800 [-] Site starting on 6800
2018-03-01T15:35:59+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x060E07F0>
2018-03-01T15:35:59+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner'
2018-03-01T15:36:11+0800 [twisted.python.log#info] "127.0.0.1" - - [01/Mar/2018:07:36:10 +0000] "GET / HTTP/1.1" 200 699 "-" "Mozilla
/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"
2018-03-01T15:36:11+0800 [twisted.python.log#info] "127.0.0.1" - - [01/Mar/2018:07:36:10 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "
http://127.0.0.1:6800/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.
36"

浏览器打开 http://127.0.0.1:6800/ 说明服务端运行正常了


image.png
2、客户端打包上传到服务端

命令行 scrapyd-deploy 客户端工具将egg文件发布到服务端 settools支持
注win下, scrapyd-deploy 找不到 在安装目录下 Python\Scripts 添加
scrapyd-deploy.bat

image.png
内容如下
    @echo off
    "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\python.exe" "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\Scripts\scrapyd-deploy" %1 %2 %3 %4 %5 %6 %7 %8 %9

项目中scrapy.cfg
deploy 前的url#去掉 取消注释
如下

[settings]
default = scrapy1.settings

[deploy]
url = http://localhost:6800/
project = scrapy1

查看服务列表即deploy 中的url配置

>scrapyd-deploy -l
    default              http://localhost:6800/

发布客户端包
scrapyd-deploy <target> -p <project> --version <version>

>scrapyd-deploy default -p scrapy1
Packing version 1519890216
Deploying to project "scrapy1" in http://localhost:6800/addversion.json
Server response (200):
{"node_name": "SC-201711261536", "status": "ok", "project": "scrapy1", "version": "1519890216", "spiders": 8}

3、测试

通过http请求的方式调用详细查看
http://scrapyd.readthedocs.io/en/latest/api.html

执行一次爬虫任务

相关文章

网友评论

      本文标题:python scrapy爬虫框架

      本文链接:https://www.haomeiwen.com/subject/mdubxftx.html