美文网首页我爱编程
选择器(selectors)常用语法

选择器(selectors)常用语法

作者: 我一不小心就 | 来源:发表于2018-04-17 10:24 被阅读0次
    #通过以下代码获取名为 response 的shell变量
    scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html
    
    #得到根URL(base URL)和一些图片链接:(通过xpath和css两种方式)
    >>> response.xpath('//base/@href').extract()
    [u'http://example.com/']
    
    >>> response.css('base::attr(href)').extract()
    [u'http://example.com/']
    
    >>> response.xpath('//a[contains(@href, "image")]/@href').extract()
    [u'image1.html',
     u'image2.html',
     u'image3.html',
     u'image4.html',
     u'image5.html']
    
    >>> response.css('a[href*=image]::attr(href)').extract()
    [u'image1.html',
     u'image2.html',
     u'image3.html',
     u'image4.html',
     u'image5.html']
    
    >>> response.xpath('//a[contains(@href, "image")]/img/@src').extract()
    [u'image1_thumb.jpg',
     u'image2_thumb.jpg',
     u'image3_thumb.jpg',
     u'image4_thumb.jpg',
     u'image5_thumb.jpg']
    
    >>> response.css('a[href*=image] img::attr(src)').extract()
    [u'image1_thumb.jpg',
     u'image2_thumb.jpg',
     u'image3_thumb.jpg',
     u'image4_thumb.jpg',
     u'image5_thumb.jpg']
    
    #嵌套选择器
    #先获取所有包含图片的url的链接的集合
    >>> links = response.xpath('//a[contains(@href, "image")]')
    >>> links.extract()
    [u'<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>',
     u'<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>',
     u'<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>',
     u'<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>',
     u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
    
    >>> for index, link in enumerate(links):
            args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
            print 'Link number %d points to url %s and image %s' % args
    
    Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
    Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
    Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg']
    Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
    Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
    
    
    

    上面列举的只是一部分具体的
    参考:http://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/selectors.html#topics-selectors

    相关文章

      网友评论

        本文标题:选择器(selectors)常用语法

        本文链接:https://www.haomeiwen.com/subject/ywkukftx.html