美文网首页
22.基于hive的应用实例(wordcount和基站掉话率)

22.基于hive的应用实例(wordcount和基站掉话率)

作者: 文茶君 | 来源:发表于2019-12-18 14:26 被阅读0次

    一实现wordcount

    1.装入数据.png
    hello world hive
    hadoop hello hive
    hello
    world
    
    2.把数据装入hdfs.png

    创建表
    因为先有数据再有表
    所以创建外部表
    不知一行有几行单词,把整行当成一个列来存

    create external table wc(
    line string
    );
    location '/usr/'
    
    3.创建外部表.png

    创建结果表

    create table wc_result(
    word string,
    ct int
    );
    
    4查看数据结构1.png

    查看是数组结构


    5查看数据结构2.png

    再切割

    from (select explode(split(line,' ')) word(这里给别名)  from wc) t1
     insert into wc_result
    select t1.word,count(t1.word) group by t1.word;
    
    6执行MR
    7.结果.png

    基站掉话率

    基站掉话率,找出掉线率最高的前10名
    1.数据格式
    record_time,imei,cell,ph_num,call_num,drop_num,duration,drop_rate,net_type,erl
    2011-07-13 00:00:00+08,356966,29448-37062,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,352024,29448-51331,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,353736,29448-51331,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,353736,29448-51333,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,351545,29448-51333,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,353736,29448-51343,1,0,0,8,0,G,0
    2011-07-13 00:00:00+08,359681,29448-51462,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,354707,29448-51462,0,0,0,0,0,G,0
    2011-07-13 00:00:00+08,356137,29448-51470,0,0,0,0,0,G,0
    

    排名需求
    record time:通话时间
    imei:基站编号
    cell:手机编号
    drop_num:掉话的描述
    duration:通话持续总秒数

    建表

    create table call_monitor (
    record_time string,
    imei string,
    cell string,
    ph_num string,
    call_num string,
    drop_num int,
    duration int,
    drop_rate double,
    net_type string,
    erl string
    )
    row format delimited fields terminated by ',';
    
    2.装上数据文件 3.创建表并加载数据 4查看前几行数据,结果正常(出现null是因为结果类型不匹配,设置的是int)
    create table call_result(
    imei string,
    drop_num int,
    duration int,
    drop_rate double
    )
    

    查询

    5执行却报错.png

    出错原因在于sdura是别名,却在此写sdrop/sdura。

    from call_monitor cm
    insert into call_result 
    select cm.imei,sum(cm.drop_num) sdrop,
    sum(cm.duration) sdura,
    sum(cm.drop_num)/sum(cm.duration) s_rate group by  cm.imei order by s_rate desc;
    
    
    6运行.png 7,结果.png

    相关文章

      网友评论

          本文标题:22.基于hive的应用实例(wordcount和基站掉话率)

          本文链接:https://www.haomeiwen.com/subject/dxosnctx.html