美文网首页
scrapy中start_requests循环拉取loop任务

scrapy中start_requests循环拉取loop任务

作者: 佑岷 | 来源:发表于2019-07-10 17:59 被阅读0次

需求中希望scrapy的spider能够一直循环从Redis、接口中获取任务,要求spider不能close。
一版实现在start_requests中:

def start_requests(self):
    ......
    while True:
        yield scrapy.Request(url,  dont_filter=True)
    ......

但是这种写法会导致任务被频繁的获取就是向下一步执行。
后用signals实现:

from scrapy import signals
from scrapy.exceptions import DontCloseSpider

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(AutoengSpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_idle, signal=signals.spider_idle)
        return spider

    def start_requests(self):
        yield self.next_req()

    def spider_idle(self, spider):
        request = self.next_req()
        if request:
            self.crawler.engine.schedule(request, self)
        else:
            time.sleep(2)
        raise DontCloseSpider()

相关文章

网友评论

      本文标题:scrapy中start_requests循环拉取loop任务

      本文链接:https://www.haomeiwen.com/subject/gpyqcctx.html