response.follow()不用拼接域名url
yield response.follow(url, callback=self.parse_mate)
xpath选择所有子类文本例子.xpath('string(.)')
:
node_list = response.xpath('//h3[@class="c-title"]/a').xpath('string(.)').extract_first()
获取子标签带html标签的xpath :
''.join(node.xpath('./h3[@class="c-title"]/a/node()').extract())
获取子标签只获取文本:
node.xpath('./h3[@class="c-title"]/a').xpath('string(.)').extract_first().
获取html内容和beatifullsoup一样
response = etree.HTML(content)
response.tostring()
给一个离线库下载的网站:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml
网友评论