Scrpay学艺二：Selectors选择器

作者: 狂奔的胖蜗牛 | 来源:发表于2021-06-28 18:29 被阅读0次

Scrpay学艺二：Selectors选择器
CSS 之选择器
CSS选择器常见的有哪几种?
H5：选择器 Selectors
2017-3-23 less
UiPath怎么输入验证码
CSS部分
基础知识--css
CSS基础之选择器
「选择器」选择器的概览

1.什么是Selectors

在学艺一中，我们通过response.css()方法获取到的就是Selectors。我们可以通过Selectors获取要我们要提取的数据。

2.使用Selectors

2.1 Response的选择器实例

返回数据response有一个selector属性，该属性是一个Selector实例。

response.selector.xpath('//span/text()').get()

2.2 xpath和css方法

image.png

从源码可以看出来，response的xpath和css方法都是调用了selector然后执行对应方法。

response.xpath('//span/text()').get()
response.css('span::text').get()

2.3 自定义Selector

当然，也可以自定义Selector来使用。

body = '<html><body><span>good</span></body></html>'
Selector(text=body).xpath('//span/text()').get()

2.4 提取数据

使用xpath或者css获得selectors

response.xpath('//title/text()')
response.css('title::text')

使用get()和getall()方法获取数据

// get获取单个数据，如果数据很多，返回第一个数据，如果没有数据，返回none，get()可以指定默认值。get(default="")
response.xpath('//title/text()').get()
// 返回所有数据
response.css('title::text').getall()

xpath和css可以混着使用。

response.css('img').xpath('@src').getall()

Selector提供了一个attrib方法，可以提取指定属性的节点的内容，如果有多个数据，则效果同get()，只会返回一个。

response.css('img').attrib['src']

2.5 css扩展

由于CSS Selector是不支持获取文本和属性的，Scrapy扩展了两个：
1.获取文本::text

response.css('title::text').get()

2.获取属性::attr(name)

response.css('a::attr(href)').getall()

2.6 获取属性的方式

使用xpath、css或者selector的attrib方法。

response.xpath('//img/@src').get()
response.css('img::attr(src)').get()
response.css('img').attrib['src']

使用selector的attrib方法时，如果不传入属性名，则会把key和value一起返回。

response.css('img').attrib
// {'src': 'image1_thumb.jpg'}为空时，返回{}

2.7 使用re()方法

re方法，能够根据给出的正则表达式，去匹配对应的数据。

response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')

执行过后返回的就是数据了，无需再调用get或getall方法。
使用re_frist()能够返回第一个数据。

2.8 extract()和extract_first()

这两个方法官方已经不推荐使用了，实际上，get()==extract_first()，getall()==extract()方法。

网友评论

本文标题：Scrpay学艺二：Selectors选择器

本文链接：https://www.haomeiwen.com/subject/sytcultx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！