美文网首页数据仓库
hive存储json格式文件

hive存储json格式文件

作者: 皮皮猿的博客 | 来源:发表于2017-06-09 19:17 被阅读1429次

    hive从0.12版本以后就开始自身支持json文件的格式了

    1.文件格式

    下面是测试用的文件预览格式,我将其存储为jsonTest.json文件,放在了/root目录下

    {"_location":"(32.121,  45.262)","_ip":"2.12.1.211","_action":"click button","_uid":"694666ee-8918-48f8-bb92-0a756a3f1f31","_timestamp":"1496999163163"}
    {"_location":"(32.121,  45.262)","_ip":"7.12.1.211","_action":"click button","_uid":"868be529-f59e-4f16-8ea5-08c4612ede9a","_timestamp":"1496999164165"}
    {"_location":"(32.121,  45.262)","_ip":"2.12.1.211","_action":"click button","_uid":"49b63380-2a5a-453d-aa5b-57e74cd1a8d7","_timestamp":"1496999165170"}
    {"_location":"(32.121,  45.262)","_ip":"3.12.1.211","_action":"click button","_uid":"60fd093a-9f6a-4716-81c6-72a5f07d3b4f","_timestamp":"1496999166171"}
    {"_location":"(32.121,  45.262)","_ip":"3.12.1.211","_action":"click button","_uid":"056cf558-56b1-4312-aba1-0b0eb71ba78e","_timestamp":"1496999167172"}
    {"_location":"(32.121,  45.262)","_ip":"8.12.1.211","_action":"click button","_uid":"a2ecd34c-5458-43a2-b851-89eb6b3985ae","_timestamp":"1496999168174"}
    

    注意: 一行之内必须是完整的一条json文件,否则会解析出错误

    2.创建hive表

    CREATE TABLE IF NOT EXISTS jsonTest
    (
      `_location`  STRING,
      `_ip`        STRING,
      `_action`    STRING,
      `_uid`       STRING,
      `_timestamp` STRING)
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    STORED AS TEXTFILE;
    

    注意: ``符号是因为列明里面是下划线开头;并且列明需要与json文件中的key对应,否则也会解析不到数据.

    3.加载数据

    LOAD DATA LOCAL INPATH "/root/jsonTest.json" OVERWRITE INTO TABLE jsonTest;
    
    

    4.查询数据进行校验

    hive> select * from jsonTest_test;
    OK
    (32.121,  45.262)       2.12.1.211      click button    54490a2f-376a-43e3-9658-e9a3d6583ffd    1496999162153
    (32.121,  45.262)       2.12.1.211      click button    694666ee-8918-48f8-bb92-0a756a3f1f31    1496999163163
    (32.121,  45.262)       7.12.1.211      click button    868be529-f59e-4f16-8ea5-08c4612ede9a    1496999164165
    (32.121,  45.262)       2.12.1.211      click button    49b63380-2a5a-453d-aa5b-57e74cd1a8d7    1496999165170
    (32.121,  45.262)       3.12.1.211      click button    60fd093a-9f6a-4716-81c6-72a5f07d3b4f    1496999166171
    (32.121,  45.262)       3.12.1.211      click button    056cf558-56b1-4312-aba1-0b0eb71ba78e    1496999167172
    (32.121,  45.262)       8.12.1.211      click button    a2ecd34c-5458-43a2-b851-89eb6b3985ae    1496999168174
    Time taken: 0.439 seconds, Fetched: 7 row(s)
    

    可以看到数据已经成功解析出来

    问题

    1.出现下面错误

    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe
    

    需要将支持序列化的hive-hcatalog-core.jar添加进来,执行如下命令即可解决该问题.(找到自己安装的目录即可,我的是在这个目录里)

    hive> add jar /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
    

    参考资料
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-JSON

    相关文章

      网友评论

        本文标题:hive存储json格式文件

        本文链接:https://www.haomeiwen.com/subject/pimfqxtx.html