scrapy-4.pipeline

作者: ddm2014 | 来源:发表于2018-06-18 17:15 被阅读0次

scrapy-4.pipeline

pipeline是清洗数据存入数据库的
清洗数据看每个人的需求，但是存入数据库是有套路的。
就是在这个class里有三个def，一个是open_spider，一个是close_spider，一个是process_item，存入sqlite3基本都是这个套路，需要注意的是数据库的列名要和items的一致。


# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import sqlite3

class ZdmPipeline(object):
    def open_spider(self,spider):
        self.conn = sqlite3.connect('test.sqlite')
        self.cur = self.conn.cursor()
        self.cur.execute('CREATE TABLE IF NOT EXISTS sm(name varchar(100),price varchar(50))')

    def close_spider(self,spider):
        self.conn.commit()
        self.conn.close()

    def process_item(self, item, spider):
        sql = 'insert into sm({}) VALUES ({})'
        col = ','.join(item.keys())
        holder = ','.join(len(item)*'?')
        self.cur.execute(sql.format(col,holder),list(item.values()))
        return item

最后按照注释所说去settings把对应的pipeline打开，就是取消注释。

settings

如果有多个需求比如要先处理数据然后在存入数据库，就写多个class，然后在settings里添加多个pipeline，数字小的先开始运行。

网友评论

本文标题：scrapy-4.pipeline

本文链接：https://www.haomeiwen.com/subject/nyqgeftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

scrapy-4.pipeline

相关文章

scrapy-4.pipeline

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读