UDF和UDAF简介
UDF
udf 是hive function 是传入某一行的一个或者多个字段,然后返回一个value;
例如:
SELECT lower(CLOUMN_NAME) FROM TABLE_NAME
UDAF
UDFA是需要做一些聚合操作的函数
例如:
select sum(column_name) from table_name
如何创建自定义UDF和UDAF
- 你可以使用两种不同的方式创建UDF或者UDAF :“simple” 和 "gerneric"
Simple
simple方式就直接继承UDF类
/** A simple UDF to convert Celcius to Fahrenheit */
public class ConvertToCelcius extends UDF {
public double evaluate(double value) {
return (value - 32) / 1.8;
}
}
完成后你可以这样调用:
hive> addjar my-udf.jar
hive> create temporary function fahrenheit_to_celcius using "com.mycompany.hive.udf.ConvertToCelcius";
hive> SELECT fahrenheit_to_celcius(temp_fahrenheit) from temperature_data;
简而言之,创建一个simple udf 你只需要做两件事
- 继承rg.apache.hadoop.hive.ql.exec.UDF class
- 实现一个evaluate方法
simple udf 可以使用大量的数据类型,不仅仅是java primitive types 也可以使用hadoop IO types
string | java.lang.String, org.apache.hadoop.io.Text |
---|---|
int | int, java.lang.Integer, org.apache.hadoop.io.IntWritable |
boolean | bool, java.lang.Boolean, org.apache.hadoop.io.BooleanWritable |
array<type> | java.util.List<Java type> |
map<ktype, vtype> | java.util.Map<Java type for K, Java type for V> |
struct | Don't use Simple UDF, use GenericUDF |
Simple vs Generic
Simple | Generic |
---|---|
Reduced performance due to use of reflection: each call of the evaluate method is reflective. Furthermore, all arguments are evaluated and parsed. | Optimal performance: no reflective call, and arguments are parsed lazily |
Limited handling of complex types. Arrays are handled but suffer from type erasure limitations | All complex parameters are supported (even nested ones like array<array> |
Variable number of arguments are not supported | Variable number of arguments are supported |
Very easy to write | Not very difficult, but not well documented |
Generic
首先,需要继承GenericUDF类
然后需要实现三个方法:
public interface GenericUDF {
public Object evaluate(DeferredObject[] args) throws HiveException;
public String getDisplayString(String[] args);
public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
}
- initalize
public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
这个方法为每个传入的参量 接受一个 ObjectInspector 并且为返回值返回一个ObjectInspector
- evaluate
这个方法实现函数的逻辑 - getDisplayString
随便返回个说明
例子:
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.IntWritable;
public class UDFMultiplyByTwo extends GenericUDF {
PrimitiveObjectInspector inputOI;
PrimitiveObjectInspector outputOI;
public ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
// This UDF accepts one argument
assert (args.length == 1);
// The first argument is a primitive type
assert(args[0].getCategory() == Category.PRIMITIVE);
inputOI = (PrimitiveObjectInspector)args[0];
/* We only support INTEGER type */
assert(inputOI.getPrimitiveCategory() == PrimitiveCategory.INT);
/* And we'll return a type int, so let's return the corresponding object inspector */
outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
return outputOI;
}
public Object evaluate(DeferredObject[] args) throws HiveException {
if (args.length != 1) return null;
// Access the deferred value. Hive passes the arguments as "deferred" objects
// to avoid some computations if we don't actually need some of the values
Object oin = args[0].get();
if (oin == null) return null;
int value = (Integer) inputOI.getPrimitiveJavaObject(oin);
int output = value * 2;
return new IntWritable(output);
}
@Override
public String getDisplayString(String[] args) {
return "Here, write a nice description";
}
}
网友评论