xpath之string()方法:匹配xpath的文本
使用text()虽然很方便,但有时候也有缺点,比如出现多个文本标签的时候,提取的文本是分开的,而sring()可以解决这个问题.
from scrapy.selector import Selector
text='<a href="#">Click here to go to the <strong>next page</strong></a>'
sel=Selector(text=text)
##以下这两句效果一样:
print(sel.xpath('//a/strong/text()').extract())
print(sel.xpath('string(//a/strong)').extract())
>>>['next page']
##匹配a标签的所有文本:
print(sel.xpath('//a//text()').extract())
>>>['Click here to go to the ', 'next page']
# 如果坚持要用可以这么写
#want=''.join(content)
#print(want)
#>>>Click here to go to the next page
##这显然不是我们想要的,使用string()方法解决问题:
print(sel.xpath('string(//a)').extract())
>>>['Click here to go to the next page']
xpath之contains(str1,str2)方法:判断str1是否包含str2,返回布尔值:
from scrapy.selector import Selector
text="""
<div>
<p class="small info">hello world</p>
<p class="normal info">hello scrapy</p>
</div>
"""
sel=Selector(text=text)
>>>print(sel.xpath('//p[contains(@class,"small")]'))
>>>print(sel.xpath('//p[contains(@class,"info")]'))
if sel.xpath('//p[contains(@class,"small")]'):
print(True)
>>>True
- 稍微修改下:
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
text="""
<div>
<p class="small info">hello world</p>
<p class="normal info">hello scrapy</p>
</div>
"""
selector=Selector(text=text)
#content=selector.xpath('//p[contains(@class,"small info")]').extract() #匹配属性
#匹配文本
content=selector.xpath('//p[contains(text(),"hello")]').extract_first()
if 'hello' in content:
print("content匹配正确")
>>>
content匹配正确
网友评论