(十六) Avoiding getting banned

(十六) Avoiding getting banned

作者: iamlightsmile | 来源:发表于2019-05-04 23:16 被阅读0次

(十六) Avoiding getting banned
Daily diary — 10th June
Brazil Blocks WhatsApp Again - V
2021-01-15《Small Talk》
2016.12.29 English Study Notes
TWP|俄罗斯被禁参加2018韩国平昌冬奥会
3 Airlines Are Banning Passenger
MOMENT OF SILENCE 沉默的时候
9 Beyond the DOM with jQuery uti
Zompeas Defence

有些网站实现了特定的机制，以一定规则来避免被爬虫爬取。与这些规则打交道并不容易，需要技巧，有时候也需要些特别的基础。

下面是些处理这些站点的建议(tips):

使用user agent池，轮流选择之一来作为user agent。池中包含常见的浏览器的user agent(google一下一大堆)
禁止cookies(参考 COOKIES_ENABLED)，有些站点会使用cookies来发现爬虫的轨迹。
设置下载延迟(2或更高)。参考 DOWNLOAD_DELAY 设置。
如果可行，使用 Google cache 来爬取数据，而不是直接访问站点。
使用IP池。例如免费的 Tor项目或付费服务(ProxyMesh)。
使用高度分布式的下载器(downloader)来绕过禁止(ban)，您就只需要专注分析处理页面。这样的例子有: Crawlera

相关文章

(十六) Avoiding getting banned
有些网站实现了特定的机制，以一定规则来避免被爬虫爬取。与这些规则打交道并不容易，需要技巧，有时候也需要些特别的基...
Daily diary — 10th June
The article I wrote yesterday was banned abruptly and ...
Brazil Blocks WhatsApp Again - V
For the third time in a year, Brazil has banned access to...
2021-01-15《Small Talk》
Avoiding the topic or changing the subject: This is a lit...
2016.12.29 English Study Notes
halt 停止 violate on sth on possessing banned substances mo...
TWP|俄罗斯被禁参加2018韩国平昌冬奥会
Russia banned from 2018 Olympics for widespread doping pr...
3 Airlines Are Banning Passenger
Three Australian airlines have banned passengers from usi...
MOMENT OF SILENCE 沉默的时候
By Arnaud van der Veere The garden silent we have banned ...
9 Beyond the DOM with jQuery uti
This chapter covers The jQuery properties Avoiding confli...
Zompeas Defence
Beanotown's mayor has banned all food except mushy peas! ...

网友评论

爬虫Scrapy系列

本文标题：(十六) Avoiding getting banned

本文链接：https://www.haomeiwen.com/subject/vhkqoqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

爬虫Scrapy系列

热点阅读

爬虫Scrapy系列

关于我们|服务条款|联系我们|(十六) Avoiding getting banned|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！