报错信息:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.
报错代码如下
select
grad_id,
count(distinct user_id) as total_users,
count(distinct case when status = 1 then user_id else null end) as distinct_total_users
from
student_center_info
group by
grad_id;
在impala中运行代码会报如下错误
运行报错信息如下:
AnalysisException: all DISTINCT aggregate functions need to have the
same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.
出现这个个报错的原因是因为count()按某个列统计时,只能使用一次,如上面的代码,使用两次count()对user_id统计时,就会报错,解决办法如下:
select
a.grad_id,
a.total_user,
b.total_users
from
(select
grad_id,
count(distinct user_id) as total_users
from
student_center_info
group by
grad_id) as a
left join
(select
grad_id,
count(distinct case when status = 1 then user_id else null end) as total_users
from
student_center_info
group by
grad_id) as b
on
a.grad_id = b.grad_id
如果用impala的话,可以用ndv()函数来代替,但是ndv()这个去重不是准确的,如果对数据要求精度比较高的,不建议使用ndv()。
在hive中可以正常运行下面代码,并且输出正确结果
select
grad_id,
count(distinct user_id) as total_users,
count(distinct case when status = 1 then user_id else null end) as distinct_total_users
from
student_center_info
group by
grad_id;
网友评论