hive很大部分和sql语法一致。这里就介绍hive特有的(会有的是sql里的内容)
在hive中有数据表是分区表,在分区表中,查询一定要用where条件语句对分区字段进行限制
执行顺序
from → where → group by → having → select → order by
优先级 | 命令 |
---|---|
1 | FROM <left table> |
2 | ON <join condition> |
3 | JOIN |
4 | WHERE |
5 | GROUP BY |
6 | WITH |
7 | HAVING<having condition> |
8 | SELECT |
9 | DISTINCT |
10 | ORDER BY |
11 | LIMIT |
常用函数
如何把时间戳转换为日期?
from_unixtime(timestamp, format_str)
eg:from_unixtime(1983686556, 'yyyy-MM-dd hh:mm:ss')
将日期转换为时间戳
unix_timestamp(date_str)
计算时间间隔
datediff(string enddate, string startdate)
结束日期减去开始日期的天数
日期增加函数
date_add(string startdate, int days)
日期减少函数
date_sub(string startdate, int days)
条件函数
case when
-- 案例:统计用户每个年龄段人数
select
case when age<20 then '<20'
when age>=20 and age<30 then '20-30'
when age>=30 and age<40 then '30-40'
else '40<' end as age_type,
count(distinct user_id) user_num
from user_info
group by
case when age<20 then '<20'
when age>=20 and age<30 then '20-30'
when age>=30 and age<40 then '30-40'
else '40<' end;
在上面这个案例中一定要注意select中的case when语句一定要和group by下的一致。
if
select sex,
if(level>5, '高级用户', '普通用户') as level_type,
count(distinct user_id) user_num
from user_info
group by sex,
if(level>5, '高级用户', '普通用户');
字符串函数
substr(str raw_string, int start, int end)
如果不指定截取长度,则从开始一直截取到最后!
-- 案例需求:每个月激活的用户数
select substr(firstactivetime, 1, 7) as month, count(distinct user_id) user_num
from user_info
group by substr(firstactivetime, 1, 7);
如果我们是一个json字符串怎么办?
get_json_object(str raw_string, '$.key')
-- 不同手机品牌的用户数
select get_json_object(extra1, '$.phonebrand') as phone_brand,
count(distinct user_id) user_num
from user_info
group by get_json_object(extra1, '$.phonebrand');
如何取出map类型的数据呢
案例如上题一致
select extra2['phonebrand'] as phone_brand, count(distinct user_id) user_num
from user_info
group by extra2['phonebrand'];
取map就像取python里的字典一样
网友评论