记一次使用clickhouse遇到的问题
联查两张表,graphite_d 为分布式表, user 为普通表
select count(*) from default.graphite_d where member in( select member from user where date='2019-04-26')
报错:
message: Received from XXX.XXX.XXX.XXX:9000. DB::Exception: Table default.user doesn't exist.. 并且会有调用栈打印出来
查询文档发现,当 graphite_d 为分布式表时,将会把如下查询发送到所有远程服务器
select count(*) from default.graphite where member in ( select member from user where date='2019-04-26')
但是 因为user 表是普通表,只存在于一台机器上,所以其他的远程服务器在本地是无法查询到此表的
因此 user 需要成为分布式表
在每台服务器上创建 user的分布式表 user_d
CREATE TABLEuser_d (member Int64)
ENGINE = Distributed(clusterName,defaultDB, user, rand())
运行sql 如下:
select count(*) from default.graphite_d where member in( select member from user_d where date='2019-04-26')
报错:
Received exception from server:
Code: 288. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Double-distributed IN/JOIN subqueries is denied (distributed_product_mode = 'deny'). You may rewrite query to use local tables in subqueries, or use GLOBAL keyword, or set distributed_product_mode to suitable value..
根据提示以及查询clickhouse的官方文档,解决方案如下
sql 语句改为:
select count(*) from default.graphite_d where member global in( select member from user_d where date='2019-04-26')
或者select count(*) from default.graphite_d where member in( select member from default.user where date='2019-04-26')
参考文档:https://clickhouse.yandex/docs/zh/query_language/select/#select-distributed-subqueries(中文)
https://clickhouse.yandex/docs/en/query_language/select/#select-distributed-subqueries(英文)
网友评论