美文网首页
HIVE如何创建UDF和UDAF

HIVE如何创建UDF和UDAF

作者: cyangssrs | 来源:发表于2019-04-25 11:20 被阅读0次

    UDF和UDAF简介

    UDF

    udf 是hive function 是传入某一行的一个或者多个字段,然后返回一个value;
    例如:

    SELECT lower(CLOUMN_NAME) FROM TABLE_NAME
    

    UDAF

    UDFA是需要做一些聚合操作的函数
    例如:

    select sum(column_name) from table_name
    

    如何创建自定义UDF和UDAF

    • 你可以使用两种不同的方式创建UDF或者UDAF :“simple” 和 "gerneric"

    Simple

    simple方式就直接继承UDF类

        /** A simple UDF to convert Celcius to Fahrenheit */
        public class ConvertToCelcius extends UDF {
        public double evaluate(double value) {
        return (value - 32) / 1.8;
      }
    }
    

    完成后你可以这样调用:

    hive> addjar my-udf.jar
    hive> create temporary function fahrenheit_to_celcius using "com.mycompany.hive.udf.ConvertToCelcius";
    hive> SELECT fahrenheit_to_celcius(temp_fahrenheit) from temperature_data;
    

    简而言之,创建一个simple udf 你只需要做两件事

    1. 继承rg.apache.hadoop.hive.ql.exec.UDF class
    2. 实现一个evaluate方法

    simple udf 可以使用大量的数据类型,不仅仅是java primitive types 也可以使用hadoop IO types

    string java.lang.String, org.apache.hadoop.io.Text
    int int, java.lang.Integer, org.apache.hadoop.io.IntWritable
    boolean bool, java.lang.Boolean, org.apache.hadoop.io.BooleanWritable
    array<type> java.util.List<Java type>
    map<ktype, vtype> java.util.Map<Java type for K, Java type for V>
    struct Don't use Simple UDF, use GenericUDF

    Simple vs Generic

    Simple Generic
    Reduced performance due to use of reflection: each call of the evaluate method is reflective. Furthermore, all arguments are evaluated and parsed. Optimal performance: no reflective call, and arguments are parsed lazily
    Limited handling of complex types. Arrays are handled but suffer from type erasure limitations All complex parameters are supported (even nested ones like array<array>
    Variable number of arguments are not supported Variable number of arguments are supported
    Very easy to write Not very difficult, but not well documented

    Generic

    首先,需要继承GenericUDF类
    然后需要实现三个方法:

        public interface GenericUDF {
        public Object evaluate(DeferredObject[] args) throws HiveException;
        public String getDisplayString(String[] args);
        public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
      }
    
    1. initalize
    public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
    

    这个方法为每个传入的参量 接受一个 ObjectInspector 并且为返回值返回一个ObjectInspector

    1. evaluate
      这个方法实现函数的逻辑
    2. getDisplayString
      随便返回个说明

    例子:

     import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
      import org.apache.hadoop.hive.ql.metadata.HiveException;
      import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
      import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
      import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
      import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
      import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
      import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
      import org.apache.hadoop.io.IntWritable;
    
      public class UDFMultiplyByTwo extends GenericUDF {
      PrimitiveObjectInspector inputOI;
      PrimitiveObjectInspector outputOI;
    
      public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
      // This UDF accepts one argument
      assert (args.length == 1);
      // The first argument is a primitive type
      assert(args[0].getCategory() == Category.PRIMITIVE);
    
      inputOI  = (PrimitiveObjectInspector)args[0];
      /* We only support INTEGER type */
      assert(inputOI.getPrimitiveCategory() == PrimitiveCategory.INT);
    
      /* And we'll return a type int, so let's return the corresponding object inspector */
      outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
    
      return outputOI;
    }
    
    public Object evaluate(DeferredObject[] args) throws HiveException {
    if (args.length != 1) return null;
    
    // Access the deferred value. Hive passes the arguments as "deferred" objects 
    // to avoid some computations if we don't actually need some of the values
    Object oin = args[0].get();
    
    if (oin == null) return null;
    
    int value = (Integer) inputOI.getPrimitiveJavaObject(oin); 
    
    int output = value * 2;
    return new IntWritable(output);
    }
    
    @Override
    public String getDisplayString(String[] args) {
    return "Here, write a nice description";
    }
    }
    

    相关文章

      网友评论

          本文标题:HIVE如何创建UDF和UDAF

          本文链接:https://www.haomeiwen.com/subject/uzlagqtx.html