美文网首页
scrapy的数据保存mongodb

scrapy的数据保存mongodb

作者: 楚糖的糖 | 来源:发表于2018-11-08 22:44 被阅读0次

    在pipeline中

    import re
    from yangguang.settings import MONGO_HOST
    from pymongo import MongoClient
    
    class YangguangPipeline(object):
        def open_spider(self,spider):
            # spider.hello = "world"
            client = MongoClient()
            self.collection = client["test"]["test"]
    
        def process_item(self, item, spider):
            spider.settings.get("MONGO_HOST")
            item["content"] = self.process_content(item["content"])
            print(item)
    
            self.collection.insert(dict(item))
            return item
    
        def process_content(self,content):
            content = [re.sub(r"\xa0|\s","",i) for i in content]
            content = [i for i in content if len(i)>0] #去除列表中的空字符串
            return content
    

    相关文章

      网友评论

          本文标题:scrapy的数据保存mongodb

          本文链接:https://www.haomeiwen.com/subject/zeigxqtx.html