如下需求
A 表202006_ims_innercg
: 有contacts_no, goods_sn, size, outin_num
B 表202006_india_qingguan_data
: 有contacts_no, goods_sn, size, y_num
两张表的contacts_no, goods_sn,size
三个字段并不能唯一确定一条数据,现在想根据这 3个维度汇总数量,出个差异表union_middle
差异表union_middle
有字段contacts_no, goods_sn, size, numA ,numB, diff(numA-numB)



解决步骤:
1.创建差异表union_middle
2.将A表、B表contacts_no, goods_sn,size
字段的数据全部插入到union_middle
中,根据此步的sql,执行之后union_middle
表中的数据是不重复的
3.对union_middle
的numA
字段进行更新
此步陷阱:因为union_middle
表中的数据已经是去重后的,所以更新次字段的时候,数据来源要正确
4.对union_middle
的numB
字段进行更新,同步骤3
5.最后计算union_middle
表的diff
字段,并将numA
和numB
字段为null的值置为0
6.数据验证,分别统计出A表outin_num
字段的和、B表y_num
字段的和、差异表numA
字段和numB
字段的和,看结果是否相等。相等则正确,不相等则不正确。
-
创建差异表union_middle
CREATE TABLE `union_middle` (
`contacts_no` varchar(255) DEFAULT NULL,
`goods_sn` varchar(255) DEFAULT NULL,
`size` varchar(255) DEFAULT NULL,
`numA` int(255) DEFAULT NULL,
`numB` int(255) DEFAULT NULL,
`diff` int(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-
将两张表的contacts_no, goods_sn,size数据插入到union_middle中
INSERT INTO union_middle (contacts_no, goods_sn, size) SELECT
a.contacts_no,
a.goods_sn,
a.size
FROM
202006_ims_innercg a
UNION
SELECT
b.contacts_no,
b.goods_sn,
b.size
FROM
202006_india_qingguan_data b;
-
更新union_middle表的numA字段
-- 添加202006_ims_innercg表索引
DROP INDEX idx_contacts_goods_size ON 202006_ims_innercg;
ALTER TABLE 202006_ims_innercg ADD INDEX idx_contacts_goods_size (contacts_no, goods_sn, size);
-- 添加union_middle表索引
DROP INDEX idx_contacts_goods_size ON union_middle;
ALTER TABLE union_middle ADD INDEX idx_contacts_goods_size (contacts_no, goods_sn, size);
UPDATE union_middle a,
(
SELECT
contacts_no,
goods_sn,
size,
COUNT(0) AS count,
SUM(outin_num) AS numA
FROM
202006_ims_innercg
GROUP BY
contacts_no,
goods_sn,
size
) c
SET a.numA = c.numA
WHERE
a.contacts_no = c.contacts_no
AND a.goods_sn = c.goods_sn
AND a.size = c.size;
-
更新union_middle表的numB字段
-- 添加202006_india_qingguan_data表索引
DROP INDEX idx_contacts_goods_size ON 202006_india_qingguan_data;
ALTER TABLE 202006_india_qingguan_data ADD INDEX idx_contacts_goods_size (contacts_no, goods_sn, size);
UPDATE union_middle a,
(
SELECT
contacts_no,
goods_sn,
size,
COUNT(0) AS count,
SUM(y_num) AS numB
FROM
202006_india_qingguan_data
GROUP BY
contacts_no,
goods_sn,
size
) c
SET a.numB = c.numB
WHERE
a.contacts_no = c.contacts_no
AND a.goods_sn = c.goods_sn
AND a.size = c.size;
-
最后计算union_middle表的diff字段,并将numA和numB字段为null的值置为0
UPDATE union_middle c
SET c.diff = IFNULL(c.numA, 0) - IFNULL(c.numB, 0),
c.numA = IFNULL(c.numA, 0),
c.numB = IFNULL(c.numB, 0)
-
数据验证
-- A表中outin_num字段的和
SELECT sum(outin_num) from 202006_ims_innercg;
-- B表中y_num字段的和
SELECT sum(y_num) from 202006_india_qingguan_data;
-- 差异表中numA和numB的和
SELECT sum(numA),SUM(numB)from union_middle;
网友评论