解读执行计划
在exists类型的子查询的执行计划中,select_type一栏分别是PRIMARY和DEPENDENT SUBQUERY
DEPENDENT SUBQUERY的意思是:子查询中的首个查询,但依赖于外层的查询
上面这句话解释得通俗一点就是外连接先执行查询,然后把查询的结果集放入子查询内进行匹配;外查询每执行一次查询,就要来子查询匹配一次
join的执行计划中,select_type一栏都是simple
join的第一行的就是外表
从上面的对比可以看出,无论是 exists类型的子查询 还是join,都基本可以看作遵循了第一行就是驱动表的规则(注意不是所有子查询都遵循这个规则,本篇只针对exists类型的dependent subquery)
分析
图一是exists子查询的执行计划,这个查询耗时117s;
图二是join的执行计划,这个查询耗时0.054s;
图一PRIMARY对应的表就是图二中的a表,DEPENDENT SUBQUERY表就是图二中的b表;a表有1500W行数据,b表有2W行数据
所以图一的SQL执行效率如此低下的原因就是大表驱动小表
优化
原子查询语句如下
SELECT * FROM a
WHERE `type` = 'appointment'
AND `event` = 14
AND EXISTS (
SELECT * FROM b
WHERE a.`sheet_id` = b.`id`
AND `company_id` = 8
AND b.`deleted_at` IS NULL
)
ORDER BY a.id DESC
LIMIT 6;
改写如下
SELECT a.*
FROM a join b on a.`sheet_id` = b.`id`
WHERE a.`type` = 'appointment'
AND a.`event` = 14
AND b.`company_id` = 8
AND b.`deleted_at` IS NULL
ORDER BY a.`id` DESC
LIMIT 6;
由于a表作为内表,因此在a.`sheet_id`,a.`type`,a.`event`上创建联合索引;语句中出现了b表的本地谓词,所以b表的b.`company_id`,b.`deleted_at`上也要创建联合索引
优化案例
今天优化的这批语句中,大多数是exists子查询的问题,可以看出这个研发小哥非常的喜欢用exists这种写法;前面的那个exists语句是泛用型,后面的exists语句加了些新花样
eg.
SELECT SUM(`xxxx`) AS ag
FROM a
WHERE EXISTS (
SELECT * FROM b
WHERE a.`delivery_sheet_id` = b.`id`
AND (`status` = 4
OR `is_rejected` = '1')
AND `company_id` = 8
AND b.`deleted_at` IS NULL
)
AND `status` IN (0, 4)
AND `collection_type` IN (2, 3)
AND a.`deleted_at` IS NULL;
or的优化通常改写union,但这里是求sum不能这么改,需要改写成2个语句然后求和;对应的列要建好索引
select c.ag+d.ag as ag from
(SELECT SUM(a.`xxxx`) AS ag
FROM a join b
on a.`delivery_sheet_id` = b.`id`
where
b.`status` = 4
AND b.`company_id` = 8
AND b.`deleted_at` IS NULL
AND a.`status` IN (0, 4)
AND a.`collection_type` IN (2, 3)
AND a.`deleted_at` IS NULL) c,
(
SELECT SUM(a.`xxxx`) AS ag
FROM a join b
on a.`delivery_sheet_id` = b.`id`
where
b.`is_rejected` = '1'
AND b.`company_id` = 8
AND b.`deleted_at` IS NULL
AND a.`status` IN (0, 4)
AND a.`collection_type` IN (2, 3)
AND a.`deleted_at` IS NULL) d;
执行时间:18s→0.2s
eg.
SELECT COUNT(a.`id`) AS aggregate
FROM a
WHERE EXISTS (
SELECT *
FROM c
INNER JOIN d ON c.`id` = d.`collection_sheet_id`
WHERE d.`clearing_sheet_item_id` = a.`id`
AND c.`id` IN (2068496, 2068497, 2068694, 2068695)
AND c.`deleted_at` IS NULL
添加相关索引,注意count的时候取的是distinct值
SELECT count(distinct a.`id`) AS aggregate
FROM a
join
(
SELECT d.clearing_sheet_item_id
FROM c
INNER JOIN d ON c.`id` = d.`collection_sheet_id`
WHERE
c.`id` IN (2068496, 2068497, 2068694, 2068695)
AND c.`deleted_at` IS NULL
) b on a.id=b.clearing_sheet_item_id;
网友评论