centos scrapy-splash 简明教程

作者: AlastairYuan | 来源:发表于2018-11-17 15:44 被阅读0次

centos scrapy-splash 简明教程
CentOS 7下MySQL5.7的修改字符集编码为UTF8
Docker 简明教程（CentOS）
OPENGL ES 教程
webpack 几个很棒的教程
Linux 常用命令总结
Socket使用简明教程－ AsyncSocket
Markdown语法
wireguard 官方快速搭建中文教程教程
pycaffe使用

一、环境安装

1、安装

pip install scrapy-splash

2、安装docker

apt install docker.io

3、运行docker

下载代码 scrapy-splash https://github.com/scrapy-plugins/scrapy-splash.git

cd scrapy-splash

执行

docker run -p 8050:8050 scrapinghub/splash

或者指定超时时间

docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 300

4. setting.py 配置 SPLASH_URL = 'http://172.17.0.1:8050/'

5. 启动爬虫scrapy crawl getdata

参考资料 API 和教程

https://splash-cn-doc.readthedocs.io/zh_CN/latest/scrapy-splash-toturial.html

https://splash-cn-doc.readthedocs.io/zh_CN/latest/api.html#render-html

https://github.com/scrapy-plugins/scrapy-splash

https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.touch_actions

二、建一个 scrapy-splash 项目

1、配置 setting.py

SPLASH_URL ifconfig docker0 -->inet addr:172.17.0.1

DOWNLOADER_MIDDLEWARES = {

# Engine side

'scrapy_splash.SplashCookiesMiddleware': 723,

'scrapy_splash.SplashMiddleware': 725,

'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,

# Downloader side

}

SPIDER_MIDDLEWARES = {

'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,

}

SPLASH_URL = 'http://172.17.0.1:8050/'

# SPLASH_URL = 'http://192.168.59.103:8050/'

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

2、使用 yield SplashRequest() 代替 yield scrapy.Request

网友评论

本文标题：centos scrapy-splash 简明教程

本文链接：https://www.haomeiwen.com/subject/flfifqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

centos scrapy-splash 简明教程

相关文章