美文网首页
Architecture overview

Architecture overview

作者: hobit | 来源:发表于2016-12-12 09:40 被阅读0次

<h3>Architecture overview</h3>
This paper describes the architecture of Scrapy and how its components interact.


Data flow

The data flow in Scrapy is controlled by the execution engine ,and goes like this:

  1. The Engine gets the inital requests to crawl from the Spider.
  2. The Engine schedules the requests in the Scheduler and ask for the next requests for crawl.
  3. The Schedular return the next requests to the Engine.
  4. The Engine sends the requests to the Donwnloader, passing through the Downloader Middlewares (see process_request()).
  5. Once the page finishes downloading the downloader generates a response(with that page) and sends it to the engine,passing through the downloader middlewares (see process_response()).
  6. The engine receives the response from the downloader and sends it to he spider for processing,passing through spider middleware (see process_spider_input()).
  7. The spider processes the response and returns the scraped items and new requests to the engine ,passing through the spider middleware(see process_spider_output()).
  8. The engine sends the processed items to the item pipelines ,then send processed requests to the scheduler and ask for possible next request to crawl.
  9. The process repeats (from step 1 ) until there are no more requests from the scheduler.

<h3>componets</h3>
<h4>Scrapy Engine</h4>
The engine is responsible for contrilling the data flow betweent all components of the system,and trigger events when certeain actions occur. See the data flow above for more details.
<h4>Scheduler</h4>
The Scheduler receives the requests from the engine and enqueues them and feeding them later(also to the engine) when the engine requests them.
<h4>Downloader</h4>
The Downloader is responsible for fetching web pages and feeding them to the engine which .in turn,feeds them to the spiders.
<h4>Spiders></h4>
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional requests to follow.
<h4>Item Pipeline</h4>
cleansing,validationand persistems.

相关文章

  • Architecture overview

    Architecture overview This paper describes the architectu...

  • AUTOSAR 架构-Communication

    0 AUTOSAR architecture 1 Communication stack overview 1.1...

  • Angular2 ARCHITECTURE OVERVIEW学习

    官网地址:https://angular.io/docs/ts/latest/guide/architecture...

  • android音视频指南-媒体应用架构概述

    翻译自Media app architecture overview 本节将解释如何将媒体播放器应用程序分离为媒体...

  • markdown测试

    overview overview overview overview hello world overview ...

  • Perception I - Sensors

    Weekly Overview Bookmark this page Overview The learning ...

  • Overview

    This is a story written by Zhang Ailing, about how two in...

  • Overview

    title: Overview Blockly 简介 Blockly是一个库,它为Web和Android应用程序添...

  • Overview

    编程是什么: 在我的眼里,编程是对生产生活的一种抽象,下一层的语言是对上一层的语言的抽象,直到抽象到10,能让计算...

  • Overview

    Java NIO包括以下几个核心的组件: Channels Buffers Selectors Java NIO还...

网友评论

      本文标题:Architecture overview

      本文链接:https://www.haomeiwen.com/subject/dekmmttx.html