美文网首页大数据开发
Impala与Hive语法差异之一count()函数使用

Impala与Hive语法差异之一count()函数使用

作者: 晨冉1688 | 来源:发表于2020-04-12 19:14 被阅读0次

报错信息:AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.

报错代码如下

select 
    grad_id,
    count(distinct user_id) as total_users,
    count(distinct case when status = 1 then user_id else null end) as distinct_total_users
from 
    student_center_info
group by 
    grad_id;

在impala中运行代码会报如下错误
运行报错信息如下:
AnalysisException: all DISTINCT aggregate functions need to have the
same set of parameters as count(DISTINCT user_id)
deviating function: count(DISTINCT CASE WHEN status = 1 THEN user_id
ELSE NULL END) Consider using NDV() instead of COUNT(DISTINCT)
if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT
query option to perform this rewrite automatically.

出现这个个报错的原因是因为count()按某个列统计时,只能使用一次,如上面的代码,使用两次count()对user_id统计时,就会报错,解决办法如下:

select 
    a.grad_id,
    a.total_user,
    b.total_users
from
    (select 
        grad_id,
        count(distinct user_id) as total_users
    from 
        student_center_info
    group by 
        grad_id) as a
 left join
     (select 
            grad_id,
            count(distinct case when status = 1 then user_id else null end) as total_users
        from 
            student_center_info
    group by 
        grad_id) as b
 on 
     a.grad_id = b.grad_id 

如果用impala的话,可以用ndv()函数来代替,但是ndv()这个去重不是准确的,如果对数据要求精度比较高的,不建议使用ndv()。

在hive中可以正常运行下面代码,并且输出正确结果

select 
    grad_id,
    count(distinct user_id) as total_users,
    count(distinct case when status = 1 then user_id else null end) as distinct_total_users
from 
    student_center_info
group by 
    grad_id;

相关文章

网友评论

    本文标题:Impala与Hive语法差异之一count()函数使用

    本文链接:https://www.haomeiwen.com/subject/whvbmhtx.html