美文网首页
数据仓库-Hive基础(九) UDF作业实录

数据仓库-Hive基础(九) UDF作业实录

作者: 做个合格的大厂程序员 | 来源:发表于2020-06-23 20:23 被阅读0次

    分享一个今天做的题:

    题目:

    有原始json数据如下:

    {"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
    {"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
    {"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
    {"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
    {"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
    {"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
    {"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}
    

    需要将数据导入到hive数据仓库中,我不管你中间用几个表,最终我要得到一个结果表:

    1193    5       978300760       1
    661     3       978302109       1
    914     3       978301968       1
    3408    4       978300275       1
    2355    5       978824291       1
    1197    3       978302268       1
    1287    5       978302039       1
    

    step 1

    第一步:先创建一个初始的表,表中只含有一个字段,这个字段包含了原始表中的一行数据。

    create table if not exists basicJson(jsonname string)row format delimited fields terminated by '\t';
    

    导入原始数据

    load data local inpath "/export/softwares/movieJson.csv";
    

    查询之后获得

    {"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
    {"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
    {"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
    {"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
    {"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
    {"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
    {"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}
    

    step 2

    打开java工程,新建项目,

    public Text evaluate(final Text jsonObject) throws Exception {
            String jsonString = jsonObject.toString();
            if (null != jsonString && !jsonString.toString().equals("")){
                return new Text(change2String(jsonString));
            }else{
                return new Text("");
            }
        }
    
        public static String change2String(String jsonString) throws Exception{
            JSONObject json = new JSONObject(jsonString);
            String movie = json.getString("movie");
            String rate = json.getString("rate");
            String timeStamp = json.getString("timeStamp");
            String uid = json.getString("uid");
            String finalString = movie+"\t"+rate+"\t"+timeStamp+"\t"+uid+"\t";
            return finalString;
        }
    

    这里我们用到了JSONObject类,需要在Maven中导入依赖

    <dependency>
        <groupId>org.json</groupId>
        <artifactId>json</artifactId>
        <version>20180813</version>
    </dependency>
    

    然后打包获得jar包导入到hive中

    add jar /export/servers/hive-1.1.0-cdh5.14.0/lib/json2String.jar;
    

    Hive中关联相关函数

    create temporary function json2String as 'cn.leon.json.json2String';
    

    step 3

    在hive中使用相关函数获得生成的查询内容并且导入到本地文件中

    insert overwrite local directory '/export/softwares/json' select json2String(jsonname) from basicjson; 
    

    然后hive创建一个新表包含四个字段

    create table if not exists finalMovies(movie string,rate string,timesstamp string,uid string)row format delimited fields terminated by '\t';
    

    最后导入本地数据

    load data local inpath '/export/softwares/json' into table finalMovies;
    

    查询出来的结果为正确

    1193    5       978300760       1
    661     3       978302109       1
    914     3       978301968       1
    3408    4       978300275       1
    2355    5       978824291       1
    1197    3       978302268       1
    1287    5       978302039       1
    

    相关文章

      网友评论

          本文标题:数据仓库-Hive基础(九) UDF作业实录

          本文链接:https://www.haomeiwen.com/subject/alhkfktx.html