Hive分析函数的使用
测试数据集:
SQL语句## COUNT、SUM、MIN、MAX、AVG
## COUNT、SUM、MIN、MAX、AVG
select user_id
,user_type
,sales
,sum(sales) OVER(partition by user_type order by sales asc) as sales_1 --默认为从起点到当前行
,sum(sales) OVER(partition by user_type order by sales asc ROWS between UNBOUNDED PRECEDING and current row) as sales_2 --从起点到当前行,结果与sales_1不同。
,sum(sales) OVER(partition by user_type order by sales asc ROWS between 3 PRECEDING and current row) as sales_3 --当前行+往前3行
,sum(sales) OVER(partition by user_type order by sales asc ROWS between 3 PRECEDING and 1 FOLLOWING) as sales_4 --当前行+往前3行+往后1行
,sum(sales) OVER(partition by user_type order by sales asc ROWS between current row and UNBOUNDED FOLLOWING) as sales_5 --当前行+往后所有行
,sum(sales) OVER(partition by user_type) as sales_6 --分组内所有行
from order_detail
order by user_type
,sales
,user_id;
SQL代码运行结果
注意:
结果和ORDER BY相关,默认为升序
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:无界限(起点或终点)
UNBOUNDED PRECEDING:表示从前面的起点
UNBOUNDED FOLLOWING:表示到后面的终点
其他COUNT、AVG,MIN,MAX,和SUM用法一样。
网友评论