HIVE如何创建UDF和UDAF

作者: cyangssrs | 来源:发表于2019-04-25 11:20 被阅读0次

UDF和UDAF简介

UDF

udf 是hive function 是传入某一行的一个或者多个字段，然后返回一个value；
例如：

SELECT lower(CLOUMN_NAME) FROM TABLE_NAME

UDAF

UDFA是需要做一些聚合操作的函数
例如：

select sum(column_name) from table_name

如何创建自定义UDF和UDAF

你可以使用两种不同的方式创建UDF或者UDAF ：“simple” 和 "gerneric"

Simple

simple方式就直接继承UDF类

    /** A simple UDF to convert Celcius to Fahrenheit */
    public class ConvertToCelcius extends UDF {
    public double evaluate(double value) {
    return (value - 32) / 1.8;
  }
}

完成后你可以这样调用：

hive> addjar my-udf.jar
hive> create temporary function fahrenheit_to_celcius using "com.mycompany.hive.udf.ConvertToCelcius";
hive> SELECT fahrenheit_to_celcius(temp_fahrenheit) from temperature_data;

简而言之，创建一个simple udf 你只需要做两件事

继承rg.apache.hadoop.hive.ql.exec.UDF class
实现一个evaluate方法

simple udf 可以使用大量的数据类型，不仅仅是java primitive types 也可以使用hadoop IO types

string	java.lang.String, org.apache.hadoop.io.Text
int	int, java.lang.Integer, org.apache.hadoop.io.IntWritable
boolean	bool, java.lang.Boolean, org.apache.hadoop.io.BooleanWritable
array<type>	java.util.List<Java type>
map<ktype, vtype>	java.util.Map<Java type for K, Java type for V>
struct	Don't use Simple UDF, use GenericUDF

Simple vs Generic

Simple	Generic
Reduced performance due to use of reflection: each call of the evaluate method is reflective. Furthermore, all arguments are evaluated and parsed.	Optimal performance: no reflective call, and arguments are parsed lazily
Limited handling of complex types. Arrays are handled but suffer from type erasure limitations	All complex parameters are supported (even nested ones like array<array>
Variable number of arguments are not supported	Variable number of arguments are supported
Very easy to write	Not very difficult, but not well documented

Generic

首先，需要继承GenericUDF类
然后需要实现三个方法：

    public interface GenericUDF {
    public Object evaluate(DeferredObject[] args) throws HiveException;
    public String getDisplayString(String[] args);
    public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
  }

initalize

public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;

这个方法为每个传入的参量接受一个 ObjectInspector 并且为返回值返回一个ObjectInspector

evaluate
这个方法实现函数的逻辑
getDisplayString
随便返回个说明

例子：

 import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
  import org.apache.hadoop.hive.ql.metadata.HiveException;
  import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
  import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
  import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
  import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
  import org.apache.hadoop.io.IntWritable;

  public class UDFMultiplyByTwo extends GenericUDF {
  PrimitiveObjectInspector inputOI;
  PrimitiveObjectInspector outputOI;

  public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
  // This UDF accepts one argument
  assert (args.length == 1);
  // The first argument is a primitive type
  assert(args[0].getCategory() == Category.PRIMITIVE);

  inputOI  = (PrimitiveObjectInspector)args[0];
  /* We only support INTEGER type */
  assert(inputOI.getPrimitiveCategory() == PrimitiveCategory.INT);

  /* And we'll return a type int, so let's return the corresponding object inspector */
  outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;

  return outputOI;
}

public Object evaluate(DeferredObject[] args) throws HiveException {
if (args.length != 1) return null;

// Access the deferred value. Hive passes the arguments as "deferred" objects 
// to avoid some computations if we don't actually need some of the values
Object oin = args[0].get();

if (oin == null) return null;

int value = (Integer) inputOI.getPrimitiveJavaObject(oin); 

int output = value * 2;
return new IntWritable(output);
}

@Override
public String getDisplayString(String[] args) {
return "Here, write a nice description";
}
}

网友评论

本文标题：HIVE如何创建UDF和UDAF

本文链接：https://www.haomeiwen.com/subject/uzlagqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！