一、--no-redirect
无此参数,默认自动重定向,有此参数就不会重定向了
终端执行:
scrapy shell https://www.shiyanlou.com/user/310176
结果如下:
[s] request <GET https://www.shiyanlou.com/user/310176>
[s] response <200 https://www.shiyanlou.com/teacher/310176>
终端执行:
scrapy shell --no-redirect https://www.shiyanlou.com/user/310176
结果如下:
[s] request <GET https://www.shiyanlou.com/user/310176>
[s] response <301 https://www.shiyanlou.com/user/310176>
二、-s
增加参数,常用的参数是 USER_AGENT
,当命令结果出现 403 时,用此参数
终端执行:
scrapy shell https://movie.douban.com/subject/3011091/
结果如下:
[s] request <GET https://movie.douban.com/subject/3011091/>
[s] response <403 https://movie.douban.com/subject/3011091/>
终端执行:
scrapy shell -s USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1
)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
https://movie.douban.com/subject/3011091/
结果如下:
[s] request <GET https://movie.douban.com/subject/3011091/>
[s] response <200 https://movie.douban.com/subject/3011091/>
如果创建了爬虫项目,也可以修改 settings.py 文件中的 USER_AGENT
字段
网友评论