explode
create table hive_wc(sentence string);
load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;
hello,world,welcome
hello,welcome
求每个单词出现的个数
1) 获取每个单词 split(sentence,",")
["hello","world","welcome"]
["hello","welcome"]
"hello"
"world"
"welcome"
"hello"
"welcome"
select word, count(1) as c
from (select explode(split(sentence,",")) as word from hive_wc) t
group by word ;
json
create table rating_json(json string);
load data local inpath '/home/hadoop/data/rating.json' into table rating_json;
select
json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid)
from rating_json limit 10;
create table hive_rownumber(id int,age int, name string, sex string)
row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;
查询出每种性别中年龄最大的2条数据 ==> topn
select * from hive_rownumber
group by sex order by age limit 2;
order by是全局的排序,是做不到分组内的排序的!!!
分析函数
select id,age,name,sex
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rank
from hive_rownumber) t
where rank<=2;
User-Defined Functions : UDF
UDF: 一进一出 upper lower substring
UDAF:Aggregation 多进一出 count max min sum ...
UDTF: Table-Generation 一进多出
IDEA + Maven
功能:输入xxx,输出Hello xxx
自定义UDF函数的步骤
1) 定义一个类 extends UDF
add jar /home/hadoop/lib/hive-1.0.jar;
CREATE TEMPORARY FUNCTION sayHello AS 'com.ruozedata.bigdata.HelloUDF';
TEMPORARY:仅对当前session(黑窗口)有效
CREATE FUNCTION sayRuozeHello AS 'com.ruozedata.bigdata.HelloUDF'
USING JAR 'hdfs://hadoop000:8020/lib/hive-1.0.jar';
源码面前,了无秘密
网友评论