美文网首页
爬虫协议

爬虫协议

作者: 部落大圣 | 来源:发表于2018-07-02 00:25 被阅读16次

今天才知道爬虫协议,我也是服了自己。之前就是在scrapy库下设置里,接触到爬虫协议。就是简单的是否遵守。今天在听网络课堂才知道很多网站都有,里面允许你做的操作,不允许的操作。如爬取多个页面,给你建议的网络延迟时间等。如果不遵循你就要小心了,很可能封你的IP
[爬虫协议][https://baike.baidu.com/item/robots%E5%8D%8F%E8%AE%AE/2483797?fromtitle=%E7%88%AC%E8%99%AB%E5%8D%8F%E8%AE%AE&fromid=15275977&fr=aladdin]

怎么查看爬虫协议呢?

在域名后面加"/robots.txt"回车,有就能看到
百度一下的爬虫协议:

User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Googlebot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: MSNBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Baiduspider-image
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: YoudaoBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou web spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou inst spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou spider2
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou blog
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou News Spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sogou Orion spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: ChinasoSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: Sosospider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/


User-agent: yisouspider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: EasouSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/

User-agent: *
Disallow: /

相关文章

  • Python 爬虫协议及建议

    爬虫协议 什么是爬虫协议:爬虫协议,也被叫做robots协议,是为了告诉网络蜘蛛哪些页面可以抓取,哪些页面不能抓取...

  • Robots协议

    好的网络爬虫,首先需要遵守Robots协议。Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除...

  • 人生不得已——Python爬虫 robots协议

    关于robots协议 Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots ...

  • 亚马逊robots协议解析

    1.robots协议 Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots ...

  • 认识robots协议

    robots协议的作用: Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robot...

  • 网络爬虫排除标准——robots协议

    Robots协议 “网络爬虫排除标准”(Robots Exclusion Protocol)也称为爬虫协议、机器人...

  • 关于购物网站及网页小游戏的robots协议

    Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion P...

  • robots

    Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion P...

  • 谷歌:爬虫协议与标准规范

    Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion P...

  • Robots.txt详解

    Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion P...

网友评论

      本文标题:爬虫协议

      本文链接:https://www.haomeiwen.com/subject/rzzhuftx.html