美文网首页Hive在简书
Hive GenericUDF函数DateDiff源码解析

Hive GenericUDF函数DateDiff源码解析

作者: 风筝flying | 来源:发表于2018-09-20 10:28 被阅读38次

    前言

    前面已经介绍过Hive UDF有两种实现方式,其中GenericUDF的方式是比较复杂的一种,为了加深对这种方式的理解,尝试去看了下Hive原生函数的源码,记录如下。新人入门,水平不足,如有错误,欢迎指正。

    源码解析

    public class GenericUDFDateDiff extends GenericUDF{
        //import java.text.SimpleDateFormat; 声明一个日期格式变量
        private transient SimpleDateFormat formatter=new SimpleDateFormat("yyyy-MM-dd");
        //import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter;
        //声明两个参数的转换变量,用来判断入参的类型
        private transient Converter inputConverter1;
        private transient Converter inputConverter2;
        //import org.apache.hadoop.io.IntWritable; 声明返回值的类型,IntWritable是Hadoop中实现的用于封装Java数据类型的类
        private IntWritable output=new IntWritable();
        //import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
        //声明两个入参的类型是Hive支持的原始数据类型
        private transient PrimitiveCategory inputType1;
        private transient PrimitiveCategory inputType2;
        private IntWritable result=new IntWritable();
    
        public GenericUDFDateDiff(){
            //import java.util.TimeZone;
            this.formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
        }
    } 
    
    

    上述代码首先继承了GenericUDF,并且定义了多个接下来会用到的变量。接下来就是重写initialize的代码:

        //import org.apache.hadoop.hive.ql.exec.UDFArgumentException
        public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException{
            //import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
            //进行参数个数检查,如果不是两个参数则抛出异常
            if(arguments.length!=2){
                throw new UDFArgumentLengthException("datediff() requires 2 argument,got "+arguments.length);
            }else{
                //
                this.inputConverter1=this.checkArguments(arguments,0);
                this.inputConverter2=this.checkArguments(arguments,1);
                //import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
                //import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
                //获取两个入参的数据类型
                this.inputType1=((PrimitiveObjectInspector)arguments[0].getPrimitiveCategory();
                this.inputType2=((PrimitiveObjectInspector)arguments[1].getPrimitiveCategory();
                ObjectInspector outputOI=PrimitiveObjectInspectorFactory.writableIntObjectInspector;
                return outputOI;
            }
        }
        
    

    在重写的initialize的代码中,首先做了参数个数的检查,当参数个数不是两个时抛出异常。然后初始化了前面声明的参数类型和参数类型转换变量。

        private Converter checkArguments(ObjectInspector[] arguments,int i) throws UDFArgumentException{
            //import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException
            //检查入参的类型
            if(arguments[i].getCategory()!=Category.PRIMITIVE){
                throw new UDFArgumentTypeException(0,"Only primitive type arguments are accepted but "+arguments[i].getTypeName()+" is passed. as first arguments");
            }else {
                //获取入参数据类型
                PrimitiveCategory inputType=((PrimitiveObjectInspector)arguments[i]).getPrimitiveCategory();
                Object converter;
                //判断入参的具体数据类型,赋值相应的converter
                switch(inputType){
                case STRING;
                case VARCHAR;
                case CHAR;
                    converter=ObjectInspectorConverters.getConverter((PrimitiveObjectInspector)arguments[i],PrimitiveObjectInspectorFactory.writableStringObjectInspector);
                    break;
                case TIMESTAMP;
                    converter=new TimestampConverter((PrimitiveObjectInspector)arguments[i],PrimitiveObjectInspectorFactory.writableTimestampObjectInspector);
                    break;
                case DATE;
                    converter=ObjectInspectorConverter.getConverter((PrimitiveObjectInspector)arguments[i],PrimitiveObjectInspectorFactory.writableDateObjectInspector);
                    break;
                default;
                    throw new UDFArgumentException("DATEDIFF() only take STRING/TIMESTAMP/DATEWRITABLE types as "+ (i+1) +"-th argument,got " inputType);
                }
                return (Converter)converter;
            }
        }
    

    checkArguments方法首先做了入参的类型检查,要求必须是Hive的原生数据类型,否则会抛出异常。然后再分别根据具体的实际数据类型,赋值相应的converter,最后对于非Sting timestamp date 的数据类型,同样抛出异常。

        private Date convertToDate(PrimitiveCategory inputType,Converter converter,DeferredObject argument) throws HiveException{
            assert converter!=null;
            assert argument!=null;
        
            if(argument.get()==null){
                return null;
            }else {
                Date date=new Date();
                switch(inputType){
                case STRING;
                case VARCHAR;
                case CHAR;
                    String dateString=converter.convert(argument.get()).toString;
                    try{
                        date=this.formatter.parse(dateString);
                        break;
                    }catch(ParseException var8){
                        return null;
                    }
                case TIMESTAMP;
                    Timestamp ts=((TimestampWritable)converter.convert(argument.get()).getTimestamp();
                    ((Date)date).setTime(ts.getTime());
                    break;
                case DATE;
                    DateWritable dw=(DateWritable)converter.convert(argument.get());
                    date=dw.get();
                    break;
                default;
                    throw new UDFArgumentException("TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types,got "+ inputType);
                }
                return (Date)date;
            }
        }
    

    convertToDate方法根据传入的参数类型,相应的converter及参数值,返回'yyyy-MM-dd'格式的Date数据
    接下来是重写evaluate方法,如下:

        public String getDisplayString(String[] children) {
            return this.getStandardDisplayString("datediff", children);
        }
        private IntWritable evaluate(Date date,Date date2){
            if(date!=null && date2!=null){
                long diffInMilliSeconds=date.getTime()-date2.getTime();
                this.result.set((int)(diffInMilliSeconds/86400000L));
                return this.result;
            }else{
                return null;
            }
        }
        public IntWritable evaluate(DeferredObject[] arguments) throws HiveException{
            this.output=this.evaluate(this.convertToDate(this.inputType1,this.inputConverter1,argument[0],this.convertToDate(this.inputType2,this.inputConvertert2,arguments[1]));
            return this.output;
        }
    

    先是定义了一个私有的evaluate方法,用来计算两个日期之间的天数差,之后重写了public evaluate方法。

    总结

    源码阅读下来,感觉源码中对数据类型的定义转换检查做的十分严格,值得再之后的自己开发过程中学习。

    相关文章

      网友评论

        本文标题:Hive GenericUDF函数DateDiff源码解析

        本文链接:https://www.haomeiwen.com/subject/rhuinftx.html