一实现wordcount

hello world hive
hadoop hello hive
hello
world

创建表
因为先有数据再有表
所以创建外部表
不知一行有几行单词,把整行当成一个列来存
create external table wc(
line string
);
location '/usr/'

创建结果表
create table wc_result(
word string,
ct int
);

查看是数组结构

再切割
from (select explode(split(line,' ')) word(这里给别名) from wc) t1
insert into wc_result
select t1.word,count(t1.word) group by t1.word;


基站掉话率
基站掉话率,找出掉线率最高的前10名

record_time,imei,cell,ph_num,call_num,drop_num,duration,drop_rate,net_type,erl
2011-07-13 00:00:00+08,356966,29448-37062,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,352024,29448-51331,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51331,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51333,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,351545,29448-51333,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51343,1,0,0,8,0,G,0
2011-07-13 00:00:00+08,359681,29448-51462,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,354707,29448-51462,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,356137,29448-51470,0,0,0,0,0,G,0
排名需求
record time:通话时间
imei:基站编号
cell:手机编号
drop_num:掉话的描述
duration:通话持续总秒数
建表
create table call_monitor (
record_time string,
imei string,
cell string,
ph_num string,
call_num string,
drop_num int,
duration int,
drop_rate double,
net_type string,
erl string
)
row format delimited fields terminated by ',';



create table call_result(
imei string,
drop_num int,
duration int,
drop_rate double
)
查询

出错原因在于sdura是别名,却在此写sdrop/sdura。
from call_monitor cm
insert into call_result
select cm.imei,sum(cm.drop_num) sdrop,
sum(cm.duration) sdura,
sum(cm.drop_num)/sum(cm.duration) s_rate group by cm.imei order by s_rate desc;


网友评论