美文网首页
hive处理json数据-get_json_object(),k

hive处理json数据-get_json_object(),k

作者: 阿童89 | 来源:发表于2020-05-30 19:32 被阅读0次

    数据分析小伙伴应该经常会遇到各种奇葩的源数据,尤其是业务复杂的公司,底层表的备注信息里会有大量的业务信息(大多是json格式),因为产品经理和开发人员的习惯和喜好不同,备注字段的格式和内容也是千奇百怪。。。本文列经常遇到的备注信息和hive常用的json解析函数,以后如果碰到新的奇葩内容,我也会持续进行更新。。。

    假设现有源数据:analyst.test_biao

    area goods_id info1 info2 info3
    a,b 1,2,3 [{"source":"7fresh","monthSales":4900}] score=9.9,arg={"source":"7fresh","monthSales":4900}] source=7fresh,monthSales=4900

    1、get_json_object()函数

    1.1 提取info列的source值

    碰到info1这种其实很幸运了,此时应该先用regexp_repalce()函数将'['和']'替换为空('['符号需要转义,所以这里需要写成'['),然后再用get_json_object()函数取出source的值

    select 
            get_json_object(regexp_replace(regexp_replace(info1,'\\[',''),']',''),$source)  as info1_source 
    from  analyst.test_biao 
    

    运行结果:

    info1_source
    7fresh

    1.2 提取info2列source的值

    info2不是一个标准的json字符串,此时需要通过正则表达式先取出arg={...}这部分内容,然后再用get_json_object()函数取出source的值

    select 
            get_json_object(regexp_extract(info2,'arg=(\{.*\})'),$source)  as info1_source 
    from  analyst.test_biao 
    

    运行结果:

    info1_source
    7fresh

    2、keyvalue()函数

    2.1 提取info3列source的值

    select 
            keyvalue(info3,'=',source)  as info1_source 
    from  analyst.test_biao 
    

    运行结果:

    info1_source
    7fresh

    3、explode()和 LATERAL VIEW()函数

    explode()函数常常和LATERAL VIEW()函数一起使用,比如想要goods_id列按逗号拆分成3行,可以运行下面的代码:

    select 
            goods_id2
           ,info1
    from  analyst.test_biao 
    LATERAL VIEW(explode(split(goods_id,','))) goods as goods_id2 
    

    运行结果:

    goods_id2 info1
    1 {"source":"7fresh","monthSales":4900}]
    2 {"source":"7fresh","monthSales":4900}]
    3 {"source":"7fresh","monthSales":4900}]

    也可以按照area和goods_id两列进行拆分,拆分后的area和goods_id两列做笛卡尔积

    select 
            goods_id2
           ,info1
    from  analyst.test_biao 
    LATERAL VIEW(explode(split(area,','))) area1  as area1 
    LATERAL VIEW(explode(split(goods_id,','))) goods_id2  as goods_id2 
    

    运行结果:

    area1 goods_id2 info1
    a 1 {"source":"7fresh","monthSales":4900}]
    a 2 {"source":"7fresh","monthSales":4900}]
    a 3 {"source":"7fresh","monthSales":4900}]
    b 1 {"source":"7fresh","monthSales":4900}]
    b 2 {"source":"7fresh","monthSales":4900}]
    b 3 {"source":"7fresh","monthSales":4900}]

    相关文章

      网友评论

          本文标题:hive处理json数据-get_json_object(),k

          本文链接:https://www.haomeiwen.com/subject/aqchzhtx.html