scrapy细节

作者: comboo | 来源:发表于2016-05-31 23:32 被阅读44次

1，Request方面callback=self.parse。注意不是self.parse()也不是callback = 'parse'

2,xpath里面文本提取要用text()

3，item生成对象的时候要在for循环之内

4，换ip，换cookie，换ua都是一样的，先调用中继器，在中继器构造相应的方法。

注意

headers(dict) – the headers of this request. The dict values can be strings (for single valued headers) or lists (for multi-valued headers). IfNoneis passed as value, the HTTP header will not be sent at all.

cookies(dict or list) –

the request cookies. These can be sent in two forms.

Using a dict:

request_with_cookies=Request(url="http://www.example.com",cookies={'currency':'USD','country':'UY'})

Using a list of dicts:

request_with_cookies=Request(url="http://www.example.com",cookies=[{'name':'currency','value':'USD','domain':'example.com','path':'/currency'}])

翻译成人话就是，cookie和header是request的一个词典类型的参数，ua是header里面的一个建。

至于代理，可以用mate存储。

meta(dict) – the initial values for theRequest.metaattribute. If given, the dict passed in this parameter will be shallow copied.

网友评论

本文标题：scrapy细节

本文链接：https://www.haomeiwen.com/subject/zychdttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

scrapy细节

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读