最近项目中需要用Scrapy写一个爬虫,用到了许多xpath和css选择器的语法,
因此在此做一个整合,基本的语法就不赘述了,只整理我遇到的比较复杂的语法,以便日后查询。
xpath语法:
<div class="tabs-panel is-active" id="size_en">
<div class="item-connection text-center ">
<a href="javascript:void(0);" class="value size-value"
data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-full-xl-red-qd0455-26651987/i/"
data-enabled="false">Full XL</a></div>
<div class="item-connection text-center active">
<a href="javascript:void(0);" class="value size-value"
data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-king-brown-qd0457-31656950/i/"
data-enabled="true">King</a></div>
<div class="item-connection text-center ">
<a href="javascript:void(0);" class="value size-value"
data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-twin-white-qd0452-32710651/i/"
data-enabled="false">Twin</a></div>
<div class="item-connection text-center ">
<a href="javascript:void(0);" class="value size-value"
data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-twin-xl-black-qd0453-31237117/i/"
data-enabled="false">Twin XL</a></div>
</div>
1.选取某属性不包含某内容的节点
#选取class不包含"active"的节点
//*[@id='size_en']/div[not(contains(@class,"active"))]
2.选取id包含REVIEWS并且aria-hidden="false"或没有aria-hidden属性的div节点
//div[contains(@id,"REVIEWS")and (@aria-hidden="false" or not(@aria-hidden))]
css选择器语法:
- 获取属性内容
#获取i标签style属性的内容
li>header>div>span>i>i::attr(style)
网友评论