load data infile 导入一个大的文件到MySQL表中,通常是一个非常耗时的过程,因为处理过程是单线程并且是单事务的,而目前的硬件条件在执行load data infile时,远远没有达到瓶颈。在MySQL 8.0.17版本,引入了一个多线程load data infile的功能,该功能实现在MySQL Shell工具中,因此需要先安装MySQL Shell。下面来对比一下传统单线程load data infile与多线程load data infile两种方式的性能差异有多大。
测试环境:
- 4 core CPU
- 8 GB RAM
- SSD Disk
- Centos 7
导入的文件load_data.csv,大约8G左右。
单线程测试:
mysql > load data infile '/usr/local/mysql/mysql-files/load_data.csv'
into table single.single_load fields terminated by ',' optionally
enclosed by '"' lines terminated by '\n'
(user_id,visible,name,size,latitude,longitude,timestamp,public,description,inserted);
Query OK, 14244516 rows affected, 0 warnings (39 min 35.5036 sec)
Records: 14244516 Deleted: 0 Skipped: 0 Warnings: 0
一共耗时 39 min 35.5036 sec。
多线程测试:
mysql shell > util.importTable("/usr/local/mysql/mysql-files/load_data.csv",
{schema: "parallel", table: "parallel_load", columns:
["user_id","visible","name","size","latitude","longitude","timestamp","public","description"
,"inserted"], dialect: "csv-unix", skipRows: 0, showProgress: true,
fieldsOptionallyEnclosed: true, fieldsTerminatedBy: ",", linesTerminatedBy:
"\n",fieldsEnclosedBy: '"',threads: 8, bytesPerChunk: "1G", maxRate: "2G"})
Importing from file '/usr/local/mysql/mysql-files/load_data.csv' to table
`parallel`.`parallel_load` in MySQL Server at /tmp%2Fmysql.sock using 8 threads
[Worker01] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker02] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker03] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker04] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker05] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker06] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker07] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker08] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
100% (8.06 GB / 8.06 GB), 535.17 KB/s
File '/usr/local/mysql/mysql-files/load_data.csv' (8.06 GB) was imported in 6 min
30.0411 sec at 18.81 MB/s
Total rows affected in parallel.parallel_load: Records: 14244516 Deleted: 0 Skipped: 0
Warnings: 0
一共耗时:6 min 30.0411 sec
结果对比:
从测试结果上看,单线程耗时 39 min 35.5036 sec,多线程耗时6 min 30.0411 sec,多线程效率大约是单线程的6倍。
多线程load data infile的几个参数:
- threads,指定线程数
- bytesPerChunk,每个chunk的大小
- maxRate,线程每秒能够处理的数据最大限制
多线程load data infile内部处理过程:
- 分析要导入的数据文件
- 将大的数据文件分割成数据块(chunk)
- 多线程将数据块导入到MySQL中
注:
译自:https://mydbops.wordpress.com/2019/10/17/mysql-load-data-infile-made-faster/
网友评论