XPath 基础学习

作者: CNSTT | 来源:发表于2018-12-17 22:23 被阅读0次

1、准备学习

在爬虫过程中，需要提取需要的元素，本文介绍了两种XPath方式：

$x('/html/head/title')

在开发者工具中右键Element Copy > Copy XPath即可

启动Scrapy Shell终端

scrapy shell https://www.gumtree.com/

xpath匹配模板

response.xpath('***').extract()

.extract('//html') 匹配的都是Unicode字符串
.re('[.0-9]+') 匹配正则表达式

获取title标题

response.xpath('/html/head/title').extract()

获取title标题内容

response.xpath('/html/head/title/text()').extract()

获取a标签下的url

response.xpath('/html//div/p/a/@href').extract()

2、在`https://www.gumtree.com/`上练习

获取class为grid-list-item的第三个div下的图片路径

response.xpath('/html//div[@class="grid-list-item"][3]//img/@src').extract()

xpath数组下标从 1 开始！区别于大多数的 0

本文标题：XPath 基础学习

本文链接：https://www.haomeiwen.com/subject/jsqkhqtx.html