ClickHouse 的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则
上传测试用例
将官方提供的测试集 visits_v1.tar 和 hits_v1.tar下载并上传到虚拟机,解压到 clickhouse 数据路径下。数据下载地址 ,当然官方也可以在线的方式,不需要下载就测试sql语句查询。在线地址
# 解压到 clickhouse 数据路径
sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse
sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse
# 修改所属用户
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
重启 clickhouse-server,执行查询
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
注意:官方的 tar 包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。hits_v1 表有 130 多个字段,880 多万条数据,visits_v1 表有 180 多个字段,160 多万条数据。
COUNT 优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows。以下语句可以使用 explain plan select.. 来看变化
# 默认会查询 count 文件直接拿到数据
select count() from hits_v1
# 会转变成 count()
select count(*) from hits_v1
# 会转变成 count()
select count(1) from hits_v1
# 只要不是写具体的字段不会触发冲洗计算
select count(UserID) from hits_v1
消除子查询重复字段或无用字段
# 实际查询语句
EXPLAIN SYNTAX SELECT
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM
hits_v1 AS a
LEFT JOIN (
SELECT
UserID,
UserID as HaHa,
VisitID
FROM visits_v1) AS b
USING (UserID)
limit 3;
# 返回优化语句
SELECT
UserID,
VisitID,
URL,
b.UserID
FROM hits_v1 AS a
ALL LEFT JOIN
(
SELECT
UserID,
VisitID
FROM visits_v1
) AS b USING (UserID)
LIMIT 3
谓词下推
当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。例如下面的查询,having name 变成了 where name,在 group by 之前过滤。
# 查询语句
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID ='8585742290196126178';
# 返回优化语句
SELECT UserID FROM hits_v1 WHERE UserID = '8585742290196126178' GROUP BY UserID
# 查询语句
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM visits_v1 ) WHERE UserID = '8585742290196126178'
# 返回优化后的语句
SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178'
# 查询语句
EXPLAIN SYNTAX SELECT * FROM (
SELECT * FROM ( SELECT UserID FROM visits_v1)
UNION ALL
SELECT * FROM ( SELECT UserID FROM visits_v1)
) WHERE UserID = '8585742290196126178'
# 优化后的
SELECT UserID FROM (
SELECT UserID FROM (
SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178'
) WHERE UserID = '8585742290196126178'
UNION ALL
SELECT UserID FROM (
SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178'
) WHERE UserID = '8585742290196126178'
) WHERE UserID = '8585742290196126178'
聚合计算外推
聚合函数内的计算,会外推
# 查询语句
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM visits_v1
# 优化后
SELECT sum(UserID) * 2 FROM visits_v1
聚合函数消除
如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除
# 查询语句
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM visits_v1 GROUP BY UserID
# 优化后
SELECT sum(UserID) * 2, max(VisitID), UserID FROM visits_v1 GROUP BY UserID
删除重复的 order by key
# 查询语句
EXPLAIN SYNTAX SELECT * FROM visits_v1 ORDER BY UserID ASC, UserID ASC, VisitID ASC,VisitID ASC
# 优化后
select …… FROM visits_v1 ORDER BY UserID ASC,VisitID ASC
删除重复的 limit by key
# 查询语句
EXPLAIN SYNTAX SELECT * FROM visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10
# 返回优化后的语句:
select …… FROM visits_v1 LIMIT 3 BY VisitID LIMIT 10
删除重复的 USING Key
# 查询语句
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a
LEFT JOIN visits_v1 AS b USING (UserID, UserID)
# 返回优化后的语句:
SELECT UserID, UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN visits_v1 AS b USING (UserID)
标量替换
如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的 total_disk_usage 字段
# 查询语句
EXPLAIN SYNTAX WITH ( SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT
(sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10;
# 返回优化后的语句
WITH CAST(0, 'UInt64') AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10
三元运算优化
如果开启了 optimize_if_chain_to_multiif 参数,三元运算符会被替换成 multiIf 函数
# 查询语句
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'atguigu') FROM numbers(10)
settings optimize_if_chain_to_multiif = 1;
# 返回优化后的语句:
SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'atguigu') FROM numbers(10)
网友评论