美文网首页我爱编程
亚马逊robots协议解析

亚马逊robots协议解析

作者: 弹弹弹弹走于思琦 | 来源:发表于2018-05-14 16:32 被阅读0次

    1. robots协议

    Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion Protocol),网站通过Robots协议告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取。robots.txt文件是一个文本文件,使用任何一个常见的文本编辑器,就可以创建和编辑它。robots.txt是一个协议,而不是一个命令。robots.txt是搜索引擎中访问网站的时候要查看的第一个文件。robots.txt文件告诉蜘蛛程序在服务器上什么文件是可以被查看的。

    ————robots协议百度百科

    2.亚马逊的robots文件

    亚马逊robots.txt

    User-agent: *                                               #针对所有爬虫


    Disallow: /buycar                         

    Disallow: /cart

    Disallow: /checkout

    Disallow: /class

    Disallow: /com

    Disallow: /common

    Disallow: /css

    Disallow: /dll

    Disallow: /doc

    #禁止访问爬取buycar、cart、checkout、class、com、common、css、dll、doc这些目录


    Disallow: /dp/e-mail-friend/

    Disallow: /dp/manual-submit/

    Disallow: /dp/product-availability/

    Disallow: /dp/rate-this-item/

    Disallow: /dp/shipping/

    Disallow: /dp/twister-update/

    #禁止访问爬取dp目录下指定的e-mail-friend、manual-submit、product-availability、rate-this-item、shipping、twister-update目录(应该是给商品评分、提交等页面信息)


    Disallow: /gp/aws/ssop

    Disallow: /gp/cart

    Disallow: /gp/css/homepage.html

    Disallow: /gp/customer-reviews/common/du

    Disallow: /gp/flex

    Disallow: /gp/gfix

    Disallow: /gp/history

    Disallow: /gp/item-dispatch

    Disallow: /gp/music/clipserve

    Disallow: /gp/music/wma-pop-up

    Disallow: /gp/offer-listing

    Disallow: /gp/product/e-mail-friend

    Disallow: /gp/product/product-availability

    Disallow: /gp/product/rate-this-item

    Disallow: /gp/recsradio

    Disallow: /gp/slredirect

    Disallow: /gp/twitter/

    Disallow: /gp/vote

    Disallow: /gp/voting/

    Disallow: /gp/yourstore

    #禁止访问爬取gp目录下指定文件(顾客评论、历史浏览、商品目录下的评分、邮件、分享至Twitter等)


    Disallow: /inc

    Disallow: /js

    Disallow: /lib

    #禁止访问爬取inc、js、lib目录


    Disallow: /mn/bookLookInsideApp

    Disallow: /mn/checkInitApp

    Disallow: /mn/checkoutAlertMsgApp

    Disallow: /mn/checkoutredirectApp

    Disallow: /mn/giftCardApp

    Disallow: /mn/loginApplication

    Disallow: /mn/loyaltyApp

    Disallow: /mn/orderAddrApp

    Disallow: /mn/orderCfmApp

    Disallow: /mn/orderDetailApp

    Disallow: /mn/orderFailApp

    Disallow: /mn/orderHistoryApp

    Disallow: /mn/orderModifyApp

    Disallow: /mn/orderSummaryApp

    Disallow: /mn/paymentRedriveApp

    Disallow: /mn/recommendReviewApp

    Disallow: /mn/releaseReviewApp

    Disallow: /mn/reviewVoteApplication

    Disallow: /mn/selectPaymentMethodApp

    Disallow: /mn/selectShippingOpptionApplication

    Disallow: /mn/shipmentTraceApp

    Disallow: /mn/shoppingCartApplication

    Disallow: /mn/tellFriend

    Disallow: /mn/thankYouApplication

    Disallow: /mn/virtualAccountApp

    Disallow: /mn/yourAccountApp

    #禁止访问爬取mn目录下的指定文件(登录账户、注销账户、选择支付方式、订单详情、失败订单、历史订单、全部订单、选择物流、物流追踪等)


    Disallow: /paper

    Disallow: /xml

    Disallow: /youraccount

    Disallow: /ap/signin

    Disallow: /gp/registry/wishlist/

    Disallow: /wishlist/

    #禁止访问爬取用户账户、登录、心愿单等目录


    Allow: /wishlist/universal*

    Allow: /wishlist/vendor-button*

    Allow: /wishlist/get-button*

    #允许访问wishlist目录下的指定文件


    Disallow: /gp/wishlist/

    Allow: /gp/wishlist/universal*

    Allow: /gp/wishlist/vendor-button*

    Allow: /gp/wishlist/ipad-install*

    #禁止访问gp目录下的wishlist中除了三个指定文件外的其他所有文件


    Disallow: /registry/wishlist/

    Disallow:/gp/help/contact-us/general-questions.html*?type&email&skip=true

    Disallow:/gp/help/customer/accessibility?ie=UTF8&initialIssue=forgotpw&skip=true

    Disallow: /gp/registry/search.html

    Disallow: /gp/orc/rml/

    Disallow: /gp/digital/fiona/manage

    Disallow: /gp/entity-alert/external

    Disallow: /gp/customer-reviews/dynamic/sims-box

    Disallow: /review/dynamic/sims-box

    Disallow: /gp/redirect.html

    Disallow: /gp/customer-media/upload/

    Disallow: /gp/customer-media/actions/delete/

    Disallow: /gp/customer-media/actions/edit-caption/

    Disallow: /gp/dmusic/

    Disallow: /registry

    Disallow: /*/wishlist

    Disallow: /gp/registry

    Disallow: /gp/aag

    Disallow: /gp/socialmedia/giveaways

    Disallow: /gp/aw/so.html

    Disallow: /gp/pdp/profile/

    #禁止访问以上指定目录文件


    Disallow: /gp/help/customer/display.html*nodeId=200843370

    Disallow: /gp/help/customer/display.html*nodeId=200877580

    Disallow: /gp/help/customer/display.html*nodeId=200877590

    Disallow: /gp/help/customer/display.html*nodeId=200879080

    Disallow: /gp/help/customer/display.html*nodeId=200879100

    Disallow: /gp/help/customer/display.html*nodeId=200879120

    Disallow: /gp/help/customer/display.html*nodeId=200879160

    Disallow: /gp/help/customer/display.html*nodeId=200879140

    Disallow: /gp/help/customer/display.html*nodeId=200877610

    Disallow: /gp/help/customer/display.html*nodeId=200878960

    Disallow: /gp/help/customer/display.html*nodeId=200878980

    Disallow: /gp/help/customer/display.html*nodeId=200879000

    Disallow: /gp/help/customer/display.html*nodeId=200879040

    Disallow: /gp/help/customer/display.html*nodeId=200879020

    Disallow: /gp/help/customer/display.html*nodeId=200877630

    Disallow: /gp/help/customer/display.html*nodeId=200879200

    Disallow: /gp/help/customer/display.html*nodeId=200879220

    Disallow: /gp/help/customer/display.html*nodeId=200879240

    Disallow: /gp/help/customer/display.html*nodeId=200879280

    Disallow: /gp/help/customer/display.html*nodeId=200879260

    Disallow: /gp/help/customer/display.html*nodeId=200877650

    Disallow: /gp/help/customer/display.html*nodeId=200879320

    Disallow: /gp/help/customer/display.html*nodeId=200879340

    Disallow: /gp/help/customer/display.html*nodeId=200879360

    Disallow: /gp/help/customer/display.html*nodeId=200879400

    Disallow: /gp/help/customer/display.html*nodeId=200879380

    Disallow: /gp/help/customer/display.html*nodeId=200877560

    Disallow: /gp/help/customer/display.html*nodeId=200843460

    Disallow: /gp/help/customer/display.html*nodeId=200843440

    Disallow: /gp/help/customer/display.html*nodeId=200899270

    Disallow: /gp/help/customer/display.html*nodeId=200879440

    Disallow: /gp/help/customer/display.html*nodeId=200899330

    Disallow: /gp/help/customer/display.html*nodeId=200899350

    Disallow: /gp/help/customer/display.html*nodeId=200899390

    Disallow: /gp/help/customer/display.html*nodeId=200899410

    Disallow: /gp/help/customer/display.html*nodeId=200899430

    Disallow: /gp/help/customer/display.html*nodeId=200899220

    Disallow: /gp/help/customer/display.html*nodeId=200899450

    Disallow: /gp/help/customer/display.html*nodeId=200899670

    Disallow: /gp/help/customer/display.html*nodeId=200899530

    Disallow: /gp/help/customer/display.html*nodeId=200899470

    Disallow: /gp/help/customer/display.html*nodeId=200899550

    Disallow: /gp/help/customer/display.html*nodeId=200899570

    Disallow: /gp/help/customer/display.html*nodeId=200899510

    Disallow: /gp/help/customer/display.html*nodeId=200899610

    Disallow: /gp/help/customer/display.html*nodeId=200899630

    Disallow: /gp/help/customer/display.html*nodeId=200899650

    Disallow: /gp/help/customer/display.html*nodeId=200879180

    Disallow: /gp/help/customer/display.html*nodeId=200879060

    Disallow: /gp/help/customer/display.html*nodeId=200879300

    Disallow: /gp/help/customer/display.html*nodeId=200879420

    Disallow: /gp/help/customer/display.html*nodeId=200899290

    Disallow: /gp/help/customer/display.html*nodeId=200899310

    Disallow: /gp/help/customer/display.html*nodeId=200843380

    Disallow: /gp/help/customer/display.html*nodeId=200843420

    Disallow: /gp/help/customer/display.html*nodeId=200899230

    Disallow: /gp/help/customer/display.html*nodeId=200899250

    Disallow: /gp/help//display.html*nodeId=200899370

    #禁止访问爬取gp/help下的指定文件(感觉像是联系亚马逊客服时特定问题的自动回复)


    Disallow: /reviews/iframe

    Disallow:/gp/help/reports/infringement/jquery/handle-notice-submit.html

    Disallow: /gp/help/customer/handler/handle-email-submit.html

    Disallow: /ss/customer-reviews/lighthouse/

    Disallow: /gp/aw/ol/

    #禁止访问爬取以上目录文件


    亚马逊的robots协议相当详细,禁止了相当多有关顾客、商品等的访问,在此robots.txt中,仅允许访问部分wishlist指定文件,个人猜测是通过这些允许爬取的文件,通过浏览器,从浏览器向用户推送相关商品信息,引导用户访问。

    相关文章

      网友评论

        本文标题:亚马逊robots协议解析

        本文链接:https://www.haomeiwen.com/subject/ohsldftx.html