美文网首页
Scrapy with rules

Scrapy with rules

作者: 方方块 | 来源:发表于2017-07-15 06:52 被阅读0次

    Usage cases - extracting links
    from scrapy.spiders import CrawlSpider, Rule

    rule

    LinkExtractor() - once at the page, grab all urls
    from scrapy.linkextractors import LinkExtractor
    rules = (rule(LinkExtractor(), ))
    callback - what to do at this page
    rules = (rule(LinkExtractor(), callback='parse_page', ))

    parse is reserved for spider

    follow - go to next page
    rules = (rule(LinkExtractor(), callback='parse_page', follow=True, ))

    since scrapy auto-filter out duplicate request, we have no fear on everypage category!

    deny_domains - duh
    beware of google.com pages, you might get banned

    allow - only scrapy certain keyworded url

    相关文章

      网友评论

          本文标题:Scrapy with rules

          本文链接:https://www.haomeiwen.com/subject/lmlahxtx.html