美文网首页head first Scrapy
3 Scrapy 爬取(2)

3 Scrapy 爬取(2)

作者: 法号无涯 | 来源:发表于2017-11-10 18:02 被阅读2次

    根据前面的知识可以写出一个简单的爬虫,再一步步完善它

    # -*- coding: utf-8 -*-
    import scrapy
    
    
    class QuotesSpider(scrapy.Spider):
        name = 'quotes'
        allowed_domains = ['quotes.toscrape.com']
        start_urls = ['http://quotes.toscrape.com/']
    
        def parse(self, response):
            quotes = reponse.xpath('//*[@class="quote"]')
            for quote in quotes:
                text = quote.xpath('.//*[@class="text"]/text()').extract_first()
                author = quote.xpath('.//*[@itemprop="author"]/text()').extract()
                tags = quote.xpath('.//*[@itemprop="keywords"]/@content').extract()
    
                print '\n'
                print text
                print author
                print tags
                print '\n'
    

    在爬虫的根目录中输入命令
    scrapy crawl quotes

    相关文章

      网友评论

        本文标题:3 Scrapy 爬取(2)

        本文链接:https://www.haomeiwen.com/subject/anovmxtx.html