1、聚合函数
聚合函数max/min/count/sum/avg 特点多进一出
hive> select count(1) from ruozedata_emp where deptno = 10; 聚合函数会走MapReduce
OK
3
Time taken: 23.402 seconds, Fetched: 1 row(s)
hive> select max(salary),min(salary),avg(salary),sum(salary) from ruozedata_emp;
OK
10300.0 800.0 2621.6666666666665 39325.0
Time taken: 23.232 seconds, Fetched: 1 row(s)
2、分组函数 group by
注意:select 中出现的字段,如果没有出现在组函数/聚合函数中,必须出现在group by里面,否则会报错
hive> select deptno,avg(salary) from ruozedata_emp group by deptno;
OK
NULL 10300.0
10 2916.6666666666665
20 2175.0
30 1566.6666666666667
Time taken: 22.19 seconds, Fetched: 4 row(s)
1. 求每个部门(deptno)、工作岗位(job)的最高工资(salary)
hive> select max(salary) from ruozedata_emp group by deptno order by job;
2.求每个部门的平均工资大于2000的部门
hive> select deptno,avg(salary) from ruozedata_emp group by deptno having avg(salary)>2000;
where 需要在group by 之前
3、cash when then
hive> select ename,salary,
> case
> when salary>1 and salary<=1000 then 'lower'
> when salary>1000 and salary<=2000 then 'middle'
> when salary>2000 and salary<=4000 then 'high'
> else 'highest'
> end
> from ruozedata_emp;
OK
SMITH 800.0 lower
ALLEN 1600.0 middle
WARD 1250.0 middle
JONES 2975.0 high
MARTIN 1250.0 middle
BLAKE 2850.0 high
CLARK 2450.0 high
SCOTT 3000.0 high
KING 5000.0 highest
TURNER 1500.0 middle
ADAMS 1100.0 middle
JAMES 950.0 lower
FORD 3000.0 high
MILLER 1300.0 middle
HIVE 10300.0 highest
Time taken: 0.117 seconds, Fetched: 15 row(s)
4、join
内连接:inner join = join
结果为完全匹配的内容
hive> select * from a join b on a.id=b.id;
1 ruoze 1 30
2 j 2 29
Time taken: 174.328 seconds, Fetched: 2 row(s)
外链接:left right all
left:左表全
hive> select * from a left join b on a.id= b.id;
1 ruoze 1 30
2 j 2 29
3 k NULL NULL
Time taken: 31.09 seconds, Fetched: 3 row(s)
right:右表全
hive> select * from a right join b on a.id= b.id;
1 ruoze 1 30
2 j 2 29
NULL NULL 4 21
Time taken: 23.674 seconds, Fetched: 3 row(s)
full:两个表数据都在,没有的字段用null 补充
hive> select * from a full join b on a.id= b.id;
1 ruoze 1 30
2 j 2 29
3 k NULL NULL
NULL NULL 4 21
Time taken: 33.676 seconds, Fetched: 4 row(s)
笛卡尔积(不会报错)
hive> select * from a join b ;(未加条件)
![](https://img.haomeiwen.com/i11864772/2b83512e6e555090.png)
5、静态分区
创建:hive> create table order_partition( ordernumber string, eventtime string ) partitioned by (event_month string) row format delimited fields terminated by '\t';
查看表结构:hive> desc formatted order_partition
# Partition Information
# col_name data_type comment
event_month string
网友评论