Python学习八十六天：item 数据model保存到数据库中

作者: 暖A暖 | 来源:发表于2019-05-16 09:44 被阅读20次

Python学习八十六天：item 数据model保存到数据库中
Mac安装MySQL以及Workbench流程
django基础记述
将数据储存至MySQL（设置唯一索引去重）
Scrapy中的各种管道保存类型
用python做爬虫
Mysql数据库
在编辑model层时，如何命名规范整洁的代码？
Scrapy基础(十二)————异步导出Item数据到Mysql
Scrapy爬取数据存入MySQL数据库

1.如何将item 数据model保存到数据库中

首先在本地创建好MySQL数据库，再数据库中创建好数据表

# 创建数据库
create database item_database;
set global validate_password_length = 1;
set global validate_password_policy = 0;
grant all on item_database.* to 'xkd'@'%' identified by '123456';
flush privileges;
# 根据item创建数据表
create table item (title varchar(255) not null, image_url varchar(255) not null, date date not null, image_path varchar(255) not null, url varchar(255) not null, url_id char(50) not null primary key);

2. 安装Python MySQL驱动

pip install mysqlclient

3. 在settings文件中修改pipeline

然后爬取页面，进行页面解析，返回item交由settings.py文件中定义好的pipelines处理

ITEM_PIPELINES = {
   # 'XKD_Dribbble_Spider.pipelines.XkdDribbbleSpiderPipeline': 300,
   # 当items.py模块yield之后，默认就是下载image_url的页面
   'XKD_Dribbble_Spider.pipelines.ImagePipeline': 1,
   'XKD_Dribbble_Spider.pipelines.MysqlPipeline': 2,
}

4. 新建pipeline，写入item到MySQL中

接着在pipelines.py文件中新建一个新的pipelines类，如MysqlPipeline，在这个类中初始化数据库连接，重写process_item()方法将item的字段读取出来，再提交到数据中表中；最后运行项目成功后，可以使用命令行工具查看数据是否插入成功；

class MysqlPipeline:
    def __init__(self):
        self.conn = MySQLdb.connect(host='localhost', user='xkd', password='123456', database='item_database', use_unicode=True, charset='utf8')
        self.cursor = self.conn.cursor()
    def process_item(self, item, spider):
        sql = 'insert into item(title, image_url, date, image_path, url, url_id)' \
              'values (%s, %s, %s, %s, %s, %s)'
        date = item['date']
        self.cursor.execute(sql, args=(item['title'], item['image_url'], date.strftime('%y-%m-%d'), item['image_path'], item['url'], item['url_id']))
        self.conn.commit()
        return item
    def spider_closed(self, spider):
        self.cursor.close()
        self.conn.close()