- where条件优化
select a.dt,a.key1,a.col1,b.col1,c.col1
from a
left join b
on a.dt=b.dt
and a.key1=b.key1
left join c
on a.dt=c.dt
and a.key1=c.key1
where a.dt='2023-01-01';
此时的执行过程是

此时可以在每次链接表之前先筛选出所需分区数据。
select a.dt,a.key1,a.col1,b.col1,c.col1
from (select a.coll,a.dt,a.key1 from a where a.dt='2023-01-01') a
left join (select b.coll,b.dt,b.key1 from b where b.dt='2023-01-01') b
on a.dt=b.dt
and a.key1=b.key1
left join (select c.coll,c.dt,c.key1 from c where c.dt='2023-01-01') c
on a.dt=c.dt
and a.key1=c.key1;
执行过程如下

- COUNT DISTINCT 优化
sql语句计算经常会用到 COUNT(DISTINCT),但在数据比较倾斜的时候 COUNT(DISTINCT) 会比较慢这时可以尝试用 GROUP BY 改写代码计算 。
select pt,count(distinct device_id) as dau
from olap.olap_log_bigc_dau_sbase_di
where pt='${-1d_yyyyMMdd}'
group by pt
优化后
select pt,count(*) as dau
from (select pt,device_id from olap.olap_log_bigc_dau_sbase_di where pt = '${-1d_yyyyMMdd}'
group by pt,device_id) t
group by pt
distinct 会将所有数据放到同一个内存中执行,而group by是先排序,然后再计数,会有一定程度的优化。
网友评论