1 要求
-
旧数据量大 268w条
-
旧数据重复的太多,需要去重
-
旧表和新表结构不一样,需要根据字段中的内容判断来得到新表中需要的数据
2 初始方法
- 思路:获取总数量用作外循环的判断和取数时limit的参数,一次取1w条数据,遍历数据,然后根据字段的内容用indexof判断是否包含,数据处理完后,根据条件判断新表里是否存在相同的数据存在则不插入。吃插入是一条一条的插入。因为要判断新表里是否存在相同的数据
- 结果(效率):初始很快,后面差不多一个小时同步5w条,速度特别的慢。耗时将近2天还没跑完。
- 代码
@RequestMapping("/dataCopyOrderLog")
public void dataCopyOrderLog() throws Exception {
logger.info("<------------------订单日志数据拷贝开始------------------>");
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_old);
Map<String, Object> counts = systemService.findOneForJdbc("select count(1) as count from t_logistics_info");
int count = Integer.parseInt(counts.get("count").toString());
int i = 0;
while (i < count) {
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_old);
List<Map<String, Object>> packs = systemService.findForJdbc("select * from t_logistics_info where limit order by CREATETIME desc " + i + " , "1000"));
for (int j = 0; j < packs.size(); j++) {
Map<String, Object> map = packs.get(i);
String orderId = null, type = null;
if (StringUtil.isNotEmpty(map.get("ORDERID"))) {
orderId = map.get("ORDERID").toString();
}
if (StringUtil.isNotEmpty(map.get("STATUS"))) {
String status = map.get("STATUS").toString();
内容省略 ......
}
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_new);
内容省略 ......
orderLogService.save(orderlog);
logger.info(new Date() + "orderLog----->count------>" + i);
i++;
}
}
logger.info("<----------------拷贝订单日志数据完成------------->");
}
3第一次优化
- 使用批量插入 on duplicate key update 需要一个字段为唯一约束 ,然而项目上的是两个字段来判断是否重复,刚好有一个字段的值是唯一的,所以把这两个字段的值拼接起来

- 给pk字段添加 UNIQUE 约束保证唯一性
alter table XXX add UNIQUE(pk)
- 批量插入的个数需要根据 max_allowed_packet 来决定
--查看大小
show VARIABLES like '%max_allowed_packet%';
--临时修改 大小的单位是字节
set global max_allowed_packet = 大小
--永久修改
---修改my.ini文件,在[mysqld]部分加入 max_allowed_packet=大小
- 结果:268w左右在1个小时左右
while (i < count) {
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_old);
List<Map<String, Object>> packs = systemService.findForJdbc("select * from t_logistics_info order by TIME desc limit " + i + " , 10000");
List<OrderLogEntity> logList = new ArrayList<>();
String sql = "insert into t_jdy_order_log1(id,order_id,LOG_TYPE,LOG_CONTENT,LOG_TIME,pk) values ";
for (int j = 0; j < packs.size(); j++) {
内容省略 ......
if (j == packs.size() - 1) {
sql += " (REPLACE(UUID(),'-',''),'" + orderId + "','" + type + "','" + status + "',STR_TO_DATE('" + date + "','%Y-%m-%d %H:%i:%s'),'" + (orderId + type) + "')";
} else {
sql += " ( REPLACE(UUID(),'-',''),'" + orderId + "','" + type + "','" + status + "',STR_TO_DATE('" + date + "','%Y-%m-%d %H:%i:%s'),'" + (orderId + type) + "'),";
}
i++;
}
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_new);
sql += " on duplicate key update id=values(id),order_id=values(order_id),log_type=values(log_type),log_time=values(log_time),pk=values(pk) ";
jdbcDao.executeSql(sql);
logger.info("i====" + i);
}
4 第二次优化
- 在读取旧数据的时候去重
SELECT * FROM `t_logistics_info` group by `STATUS` like '%已揽收%',`STATUS` like '%发往总仓%',`STATUS` like '%包裹已入库%', `STATUS` like '%包裹已全部清关%',ORDERID
- 去重后数据15w多,查询速度是29s不到

-
执行批量插入,1w1次.
微信截图_20191206144048.png
- 最终219s完成数据的迁移
public void dataCopyOrderLog1() throws Exception {
Date date1 = new Date();
logger.info("<------------------订单日志数据拷贝开始1------------------>" + date1);
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_old);
List<Map<String, Object>> packs = systemService.findForJdbc("select * from t_logistics_info group by `STATUS` like '%已揽收%',`STATUS` like '%发往总仓%',`STATUS` like '%包裹已入库%', `STATUS` like '%包裹已全部清关%',ORDERID ");
String sql = "insert into t_jdy_order_log1(id,order_id,LOG_TYPE,LOG_CONTENT,LOG_TIME,pk) values ";
for (int j = 0; j < packs.size(); j++) {
Map<String, Object> map = packs.get(j);
String orderId = null, type = null;
if (StringUtil.isNotEmpty(map.get("ORDERID"))) {
orderId = map.get("ORDERID").toString();
}
if (StringUtil.isNotEmpty(map.get("STATUS"))) {
String status = map.get("STATUS").toString();
内容省略 ......
String date = map.get("TIME").toString();
sql += " ( REPLACE(UUID(),'-',''),'" + orderId + "','" + type + "','" + status + "',STR_TO_DATE('" + date + "','%Y-%m-%d %H:%i:%s'),'" + (orderId + type) + "'),";
}
if (j % 10000 == 0 || j == packs.size() - 1) {
StringBuilder s = new StringBuilder(sql);
s.delete(sql.length() - 1, sql.length());
DataSourceContextHolder.setDataSourceType(DataSourceType.dataSource_new);
jdbcDao.executeSql(s.toString());
sql = "insert into t_jdy_order_log1(id,order_id,LOG_TYPE,LOG_CONTENT,LOG_TIME,pk) values ";
logger.info("j====" + j);
}
}
Date date2 = new Date();
Long i1 = (date2.getTime() - date1.getTime()) / 1000;
logger.info("<----------------拷贝订单日志数据完成------------->" + i1);
}
网友评论