美文网首页程序员大数据大数据吧
hive 学习系列四(用户自定义函数)

hive 学习系列四(用户自定义函数)

作者: 南山黑画说 | 来源:发表于2018-06-12 15:30 被阅读16次

    如果入参是简单的数据类型,直接继承UDF,实现一个或者多个evaluate 方法。

    具体流程如下:

    1,实现大写字符转换成小写字符的UDF

    package com.example.hive.udf;
    
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;
    
    public class Lower extends UDF {
        public Text evaluate(final Text s) {
            if (s == null) {
                return null;
            }
            return new Text(s.toString().toLowerCase());
        }
    }
    

    2,打包成jar 包。

    建立maven 项目,使用maven 打包。
    这里打包成的jar 包是,hiveudf-1.0.0.jar

    3,上传到hdfs 路径上。

    [root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
    18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
    to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
    18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
    load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -ls /user/hive/udf 
    18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
     for your platform... using builtin-java classes where applicable
    Found 1 items
    -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
    [root@master /opt]#
    

    4, 在Hive 命令行里面创建函数。

    add jar hdfs:////udf/hiveudf-1.0.0.jar;
    create temporary function lower as 'com.example.hive.udf.Lower';
    
    hive> delete jar  hiveudf-1.0.0.jar;
    hive> list jars
        > ;
    hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar
        > ;
    Added [/tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar] to class path
    Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
    hive> list jars;
    /tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar
    hive> create temporary function lower as 'com.example.hive.udf.Lower';
    OK
    Time taken: 0.594 seconds
    hive> 
    

    5,然后就可以用这个注册的函数了。

    hive> select lower('AbcDEfg')
        > ;
    OK
    abcdefg
    Time taken: 1.718 seconds, Fetched: 1 row(s)
    hive> 
    
    

    至于入参是复杂数据类型,比如Array 等, 可以继承GenericUDF

    1,同样的,先写一个类,继承GenericUDF,

    此自定义函数实现的是,把一个点,根据经纬度,转换成一个字符串。

    package com.zbra.udf;
    
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    
    /**
     * 针对复杂数据
     */
    public class GeoUdf extends GenericUDF {
    
        private DoubleObjectInspector doubleObjectInspector01;
        private DoubleObjectInspector doubleObjectInspector02;
    
        public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
            if (objectInspectors.length != 2) {
                throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: String,  String");
            }
            // 1. 检查是否接收到正确的参数类型
            ObjectInspector a = objectInspectors[0];
            ObjectInspector b = objectInspectors[1];
            if (!(a instanceof DoubleObjectInspector) || !(b instanceof DoubleObjectInspector)) {
                throw new UDFArgumentException("first argument must be a double, second argument must be a double");
            }
    
            this.doubleObjectInspector01 = (DoubleObjectInspector) a;
            this.doubleObjectInspector02 = (DoubleObjectInspector) b;
    
            return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
        }
    
        public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
    
            Double lat = this.doubleObjectInspector01.get(deferredObjects[0].get());
            Double lng = this.doubleObjectInspector02.get(deferredObjects[1].get());
    
            if (lat == null || lng == null) {
                return new String("");
            }
    
            return new GeoHash(lat, lng).getGeoHashBase32();
        }
    
        public String getDisplayString(String[] strings) {
            if (strings.length == 2) {
                return "geo_hash(" + strings[0] + ", " + strings[1] + ")";
            } else {
                return "传入的参数不对...";
            }
        }
    }
    

    2,打包成jar 包

    本文中打包成hiveudf-1.0.0.jar

    3,同样的上传到hdfs 路径中

    [root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
    18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
    to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
    18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
    load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -ls /user/hive/udf 
    18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
     for your platform... using builtin-java classes where applicable
    Found 1 items
    -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
    [root@master /opt]#
    

    4, 创建自定义函数。

    hive> list jars;
    /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
    hive> delete jar /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
        > ;
    Deleted [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] from class path
    hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar;
    Added [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] to class path
    Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
    hive> create temporary function geohash as 'com.zbra.udf.GeoUdf';
    OK
    Time taken: 0.145 seconds
    

    5, 使用如下:

    hive> select geohash(12.0d, 123.0d);
    OK
    wdpkqbtc
    Time taken: 0.8 seconds, Fetched: 1 row(s)
    hive> select geohash(cast('12' as Double), cast('123' as Double));
    OK
    wdpkqbtc
    Time taken: 0.733 seconds, Fetched: 1 row(s)
    hive> 
    

    相关文章

      网友评论

        本文标题:hive 学习系列四(用户自定义函数)

        本文链接:https://www.haomeiwen.com/subject/vtxreftx.html