simple crawler with scrapy

作者: Zihowe | 来源:发表于2017-07-30 05:04 被阅读13次

simple crawler with scrapy
Scrapy数据流转分析（三）
Python Scrapy 实战
Scrapy扩展
Scrapy+redis实现分布式爬虫简易教程
（3）scrapy中的模块导入
爬取知名技术网站
（1）scrapy中的from_crawler
amap-building-crawler 高德地图3D建筑信息
node-crawler 异步 Promise 和 Promis

Installation (python 3.4)
We need the Scrapy library (v1.3.3) along with PyMongo (v3.4.0) (latest version when this blog created) for storing the data in MongoDB. You need to install MongoDB as well(not covered).

$ pip install Scrapy==1.3.3
$ pip freeze > requirements.txt

$ pip install pymongo==3.4.0
$ pip freeze > requirements.txt

start project

$ scrapy startproject stack

Specify Data
Those familiar with Django will notice that Scrapy Items are declared similar to Django Models, except that Scrapy Items are much simpler as there is no concept of different field types.

In items.py file

#stack/items.py
from scrapy.item import Item, Field

class StackItem(Item):
    title = Field()
    url = Field()

Create the Spider
Create a file called stack_spider.py in the “spiders” directory.
Using Chrome -> inspect to copy XPath of the craped element.

# stack/spider/stack_spider.py file
from scrapy import Spider
from scrapy.selector import Selector

from stack.items import StackItem


class StackSpider(Spider):
    name = "stack"
    allowed_domains = ["stackoverflow.com"]
    start_urls = [
        "http://stackoverflow.com/questions?pagesize=50&sort=newest",
    ]

    def parse(self, response):
        questions = Selector(response).xpath('//div[@class="summary"]/h3')

        for question in questions:
            item = StackItem()
            item['title'] = question.xpath(
                'a[@class="question-hyperlink"]/text()').extract()[0]
            item['url'] = question.xpath(
                'a[@class="question-hyperlink"]/@href').extract()[0]
            yield item

Test

$ scrapy crawl stack -o items.json -t json

网友评论

本文标题：simple crawler with scrapy

本文链接：https://www.haomeiwen.com/subject/kpxyzttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

simple crawler with scrapy

相关文章