美文网首页
Scrapy duplicates filter

Scrapy duplicates filter

作者: WangLane | 来源:发表于2019-04-16 10:52 被阅读0次

    Duplicates filter

    A filter that looks for duplicate items, and drops those items that were already processed. Let’s say that our items have a unique id, but our spider returns multiples items with the same id:

    class DuplicatesPipeline(object):
    
        def __init__(self):
            self.ids_seen = set()
    
        def process_item(self, item, spider):
            if item['id'] in self.ids_seen:
                raise DropItem("Duplicate item found: %s" % item)
            else:
                self.ids_seen.add(item['id'])
                return item
    

    相关文章

      网友评论

          本文标题:Scrapy duplicates filter

          本文链接:https://www.haomeiwen.com/subject/pkbvwqtx.html