Scrapy-9.常见问题

作者: 王南北丶 | 来源:发表于2018-10-29 15:16 被阅读0次

Scrapy-9.常见问题
T区常见问题和T区常见问题解决方法，七老总经销小赖
社群运营-方法论
面试常见问题 - 目录
[别人博客]开发常见问题
react-native 爬坑经历
iOS - 一些常见问题的整理
微博：内容运营新手务必懂的5点常见问题
搭建flutter报错
React Native

本文地址：https://www.jianshu.com/p/779c793cabee

CrawlerPorcess

在Scrapy中，我们有时候需要将爬虫的运行使用代码来执行，或者是要同时执行多个爬虫，那么可以使用Scrapy中的CrawlerProcess。

使用CrawlerProcess后，就不用再用scrapy crawl命令启动爬虫了。

以下是爬取单个的例子：

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # 你定义的爬虫
    ...

# 生成一个CrawlerProcess对象，生成的时候可以传入Settings对象
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

# 使用CrawlerProcess对象绑定Spider对象
process.crawl(MySpider)
# 启动CrawlerProcess，开始抓取
# 并且会阻塞在此处，一直到Spider执行完毕
process.start()

另外，Scrapy还有一个很方便的方式，能够在另一个文件之中将Spider对象导入到CrawlerProcess中。

使用这种方法就可以很方便的把Spider和运行分别写到两个模块中。

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

# 在生成CrawlProcess时将get_project_settings传入其中
process = CrawlerProcess(get_project_settings())

# 然后就可以在crawl()方法中直接传入Spider的名称，这里的followall就是一个Spider的名字
process.crawl('followall', domain='scrapinghub.com')
process.start()

以下是运行多个爬虫的例子：

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # 定义的爬虫1
    ...

class MySpider2(scrapy.Spider):
    # 定义的爬虫2
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start()

系列文章：

网友评论

本文标题：Scrapy-9.常见问题

本文链接：https://www.haomeiwen.com/subject/jgvbtqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Scrapy-9.常见问题

CrawlerPorcess

相关文章

Scrapy-9.常见问题

T区常见问题和T区常见问题解决方法，七老总经销小赖

社群运营-方法论

面试常见问题 - 目录

[别人博客]开发常见问题

react-native 爬坑经历

iOS - 一些常见问题的整理

微博：内容运营新手务必懂的5点常见问题

搭建flutter报错

React Native

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读