技术分享 | MySQL 和 TiDB 互相快速导入全量数据

作者: 爱可生开源社区 | 来源:发表于2021-07-28 16:40 被阅读0次

技术分享 | MySQL 和 TiDB 互相快速导入全量数据
TiDB Ecosystem Tools 原理解读系列（二）Ti
Sqoop基础（二）导入，导出，创建job
MySQL数据库导入导出大数据量注意事项
Solr全量与增量导入
MySQL数据快速导入
学习规划
一次关于Kettle学习以及实践
了解TiDB基础入门
MySQL 全量增量备份方案 -3- 全备脚本与每日定时备份——

作者：杨涛涛

资深数据库专家，专研 MySQL 十余年。擅长 MySQL、PostgreSQL、MongoDB 等开源数据库相关的备份恢复、SQL 调优、监控运维、高可用架构设计等。目前任职于爱可生，为各大运营商及银行金融企业提供 MySQL 相关技术支持、MySQL 相关课程培训等工作。

本文来源：原创投稿

*爱可生开源社区出品，原创内容未经授权不得随意使用，转载请联系小编并注明来源。

MySQL 和 TIDB 有80%的语法兼容，大部分场景下可以混用，MySQL 可以做 TIDB 的上游，TIDB 也可以做 MySQL 的上游，今天来分享下两种数据库之间数据如何逻辑全量导入导出。

一般来讲，逻辑导入导出格式有两种，一种是 CSV 、TSV 等格式，另外一种就是 SQL 语句的格式。

这里我用两张表作为导入导出示例，一张是表 t1 ，另外一张是表 t1_csv ，记录都有 200W ; TIDB 版本为3.1.2，MySQL 版本为 5.7.31。

第一部分，TIDB 为上游导出数据，MySQL 作为下游导入数据。

TIDB 数据库本身不支持表记录直接导出为 CSV 文件，不过 TIDB 有额外的工具来导出(3.0 版本 mydumper 来导出 SQL 文件、4.0 版本有 dumpling 来导出 SQL 和 CSV )。

分别用 dumper 导出表 t1 ，结果为 SQL 文件；dumpling 导出表 t1_csv 结果为 CSV 文件；这两个导出程序和 mysqldump 一样，需要连接在线数据库。

dumper 导出 sql 文件的命令行：(每个文件大概256M)

[root@ytt-pc data_sql]# mydumper  -u root -h 127.0.0.1 -P 4001 -B ytt -T t1  -F 256 -o /tmp/data_sql/

导出来的文件列表：（类似我之前分享的 MySQL SHELL UTIL 组件导出的文件列表）

[root@ytt-pc data_sql]# ls -sihl
总用量 1.1G
201327040 4.0K -rw-r--r-- 1 root root  146 5月  12 18:21 metadata
201327041 4.0K -rw-r--r-- 1 root root   65 5月  12 18:21 ytt-schema-create.sql
201327077 246M -rw-r--r-- 1 root root 246M 5月  12 18:21 ytt.t1.000000002.sql
201327078 246M -rw-r--r-- 1 root root 246M 5月  12 18:21 ytt.t1.000000003.sql
201327079 245M -rw-r--r-- 1 root root 245M 5月  12 18:21 ytt.t1.000000004.sql
201327080 122M -rw-r--r-- 1 root root 122M 5月  12 18:21 ytt.t1.000000005.sql
201327075 245M -rw-r--r-- 1 root root 245M 5月  12 18:21 ytt.t1.00001.sql
201327076 4.0K -rw-r--r-- 1 root root  327 5月  12 18:21 ytt.t1-schema.sql

dumpling 导出 csv 文件的命令行：（同样，每个 CSV 文件也是256M）

[root@ytt-pc data_csv]# dumpling  -B ytt -T ytt.t1_csv -uroot  -P4001 -h 127.0.0.1 --filetype csv --filesize 256M -o /tmp/data_csv/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 18:22:05.686 +08:00] [INFO] [config.go:180] ["detect server type"] [type=TiDB]
[2021/05/12 18:22:05.686 +08:00] [INFO] [config.go:198] ["detect server version"] [version=3.1.2]
...

导出来的文件列表：

[root@ytt-pc data_csv]# ls -sihl
总用量 1.1G
555999 4.0K -rw-r--r-- 1 root root  146 5月  12 18:22 metadata
127975 4.0K -rw-r--r-- 1 root root   94 5月  12 18:22 ytt-schema-create.sql
132203 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.0.csv
555974 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.1.csv
555996 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.2.csv
555997 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.3.csv
555998  71M -rw-r--r-- 1 root root  71M 5月  12 18:22 ytt.t1_csv.4.csv
127980 4.0K -rw-r--r-- 1 root root  324 5月  12 18:22 ytt.t1_csv-schema.sql

导出来后，我写了个简单的脚本来导入这两张表到 MySQL

#!/bin/sh
usage()
{
        echo ""
        echo "Usage:./source_tidb_to_mysql csv or sql"
        echo ""
}
file_format=$1
file_path_csv=/tmp/data_csv/
file_path_sql=/tmp/data_sql/
if [ "$file_format" = "csv" ];then
    for i in `ls "$file_path_csv"ytt*.csv`
    do 
    {
        load_options="load data infile '$i' into table t1_csv fields terminated by ',' enclosed by '\"' ignore 1 lines"
        mysql -udumper -S /tmp/mysql_sandbox5731.sock -D ytt -e "$load_options"
    }
    done

elif [ "$file_format" = "sql" ];then
    for i in `ls "$file_path_sql"ytt.t1.*.sql`
    do
    {
        mysql -udumper -S /tmp/mysql_sandbox5731.sock -D ytt<$i
    }
    done
else
  usage;
fi

分别调用脚本导入表 t1 和 t1_csv 到 MySQL

导入表t1

[root@ytt-pc scripts]# ./source_tidb_to_mysql sql

导入表t1_csv

[root@ytt-pc scripts]# ./source_tidb_to_mysql csv

简单校验下表记录数是否为200W：

mysql [localhost:mysql_sandbox5731.sock] {root} (ytt) > select (select count(*) from t1) 't1_count', (select count(*) from t1_csv) 't1_csv_count';
+----------+--------------+
| t1_count | t1_csv_count |
+----------+--------------+
|  2000000 |      2000000 |
+----------+--------------+
1 row in set (1.86 sec)

第二部分，MySQL 为上游导出数据，TIDB 作为下游导入数据。

为了避免 MySQL 自由工具导出后可能需要做额外的文本修正，这里直接用 TIDB 提供的导出工具 dumpling 来导出 MySQL 表 t1 和 t1_csv ，dumpling 自动检测到数据源是 MySQL 。

[root@ytt-pc data_csv]# dumpling  -B ytt -T ytt.t1_csv -udumper -P5731  -h 127.0.0.1 --filetype csv --filesize 256M -o /tmp/data_csv/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 17:57:24.035 +08:00] [INFO] [config.go:180] ["detect server type"] [type=MySQL]
[2021/05/12 17:57:24.035 +08:00] [INFO] [config.go:198] ["detect server version"] [version=5.7.31]
...

同样，用 dumpling 工具导出格式为 SQL 的文件

[root@ytt-pc data_sql]# dumpling  -B ytt -T ytt.t1 -udumper -P5731  -h 127.0.0.1 --filetype sql --filesize 256M -o /tmp/data_sql/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 18:01:56.984 +08:00] [INFO] [config.go:180] ["detect server type"] [type=MySQL]
[2021/05/12 18:01:56.984 +08:00] [INFO] [config.go:198] ["detect server version"] [version=5.7.31]
...

MySQL 源数据导出来后，用 TIDB 提供的全量数据导入工具 tidb-lightning 进行快速导入，这个工具支持 CSV 数据源或者 mydumper/dumpling 导出的 SQL 数据源。

tidb-lightning 工具得先运行后端程序 tikv-importer ，用来把 tidb-lightning 转换的键值对应用到数据库；

再启动 tidb-lightning 程序来接受数据源，并且转换为键值对交给后台 tikv-importer 进行导入。

启动 tikv-importer 常驻进程，端口默认8287
[root@ytt-pc data_sql]# tikv-importer -A 127.0.0.1:8287
接下来启动 tikv-lightning 任务开始导入：（默认端口8289）

分别导入表 t1 和表 t1_csv

   导入表t1
   [root@ytt-pc data_sql]# tidb-lightning   --importer 127.0.0.1:8287  --status-addr 127.0.0.1:8289 --tidb-host 127.0.0.1 --tidb-port 4001 --tidb-status 10081 --tidb-user root -d /tmp/data_sql 
   
   导入表t1_csv
   [root@ytt-pc data_sql]# tidb-lightning   --importer 127.0.0.1:8287  --status-addr 127.0.0.1:8289 --tidb-host 127.0.0.1 --tidb-port 4001 --tidb-status 10081 --tidb-user root -d /tmp/data_csv

同样进行下简单的校验：

mysql [127.0.0.1:4001] {root} (ytt) > select (select count(*) from t1) t1_count, (select count(*) from t1_csv) t1_csv_count;
+----------+--------------+
| t1_count | t1_csv_count |
+----------+--------------+
|  2000000 |      2000000 |
+----------+--------------+
1 row in set (1.04 sec)

如果表数据量很小，可以考虑直接 MySQL 端执行 select ... into outfile 的方式导出 CSV ，完后 TIDB 端直接导入

比如直接导入1W行的小表 t1_small ，MySQL 端导出 CSV ：

mysql [localhost:mysql_sandbox5731.sock] {root} (ytt) > select * from t1_small into outfile '/tmp/data_csv/t1_small.csv' fields terminated by ',' enclosed by '"';
Query OK, 10000 rows affected (0.03 sec)

TIDB 端直接 SQL 命令导入：

mysql [127.0.0.1:4001] {root} (ytt) > load data local infile '/tmp/data_csv/t1_small.csv' into table t1_small fields terminated by ',' enclosed by '"';
Query OK, 10000 rows affected (1.55 sec)
Records: 10000  Deleted: 0  Skipped: 0  Warnings: 0

技术分享 | MySQL 和 TiDB 互相快速导入全量数据
作者：杨涛涛资深数据库专家，专研 MySQL 十余年。擅长 MySQL、PostgreSQL、MongoDB 等开...
TiDB Ecosystem Tools 原理解读系列（二）Ti
作者： Kenny Chan 简介 TiDB-Lightning Toolset 是一套快速全量导入 SQL du...
Sqoop基础（二）导入，导出，创建job
全量将mysql数据导入到HDFS中。其中--target-dir 可以用来指定导出数据存放至HDFS的目录。 ...
MySQL数据库导入导出大数据量注意事项
MySQL数据库导入导出大数据量注意事项 MySQL导出导入小数据量时可以使用Navicate等客户端工具，但是当...
Solr全量与增量导入
Solr Full Import全量导入所谓全量索引一般指的是每次从数据库中读取需要导入的全部数据，然后提交到S...
MySQL数据快速导入
赋予权限 grant：创建用户及赋权操作select,insert,create,drop：赋予查询、写入、建表及...
学习规划
大数据 Flink 数据库 - MySQL， Redis, TIDB 语言 - Java Python Go C ...
一次关于Kettle学习以及实践
背景最近接了一个需求，大致内容是对Oracle数据库的两张表读取合并，然后导入到Mysql数据库指定表中(全量更...
了解TiDB基础入门
由于目前的项目把mysql换成了TiDb，所以特意来了解下tidb。其实也不能说换，由于tidb和mysql几乎完...
MySQL 全量增量备份方案 -3- 全备脚本与每日定时备份——
MySQL 全备与增备方案请参考 MySQL 全量增量备份方案 -1-全量备份实现 MySQL 全量增量备份方案 ...