Apache Phoenix(三)新特性之用户定义函数UDFS（

作者: 我知他风雨兼程途径日暮不赏 | 来源:发表于2020-02-07 14:43 被阅读0次

Apache Phoenix(三)新特性之用户定义函数UDFS（
Apache Phoenix(三)新特性之用户定义函数UDFS（
Spark自定义函数（2）
Apache Phoenix（十九）新特性之指标
Apache Phoenix（十三）新特性之视图
Apache Phoenix(五)新特性之存储格式
Apache Phoenix（十二)新特性之数据抽样
Apache Phoenix（十五）新特性之动态列
Apache Phoenix(六)新特性之原子插入
Apache Phoenix（十六）新特性之块加载

在Phoenix 4.4.中我们允许用户创建和部署它们自己自定义或特定于域的用户自定义函数在集群功能中。

概述

用户可以创建临时或者永久的用户自定义或者特定于域的标量函数。UDFs可以像内置函数一样使用在查询中，比如select、upsert和delete,create函数一样。临时函数可以作用于指定的会话和连接，但是不能在其他的会话和连接中使用。永久函数元数据信息将会被储存在系统的SYSTEM_FUNCTION表中。我们可以支持特定的租户功能。函数创建在一个特定的租户连接中，其他租户连接是不可见的。仅当全局租户（没有租户）指定函数时，特定的函数对全部连接可见。
我们利用Hbase动态类加载器从HDFS中动态加载udf jar包在phoenix客户端和区域服务器中，不需要再重启服务。

配置

你需要添加下面的参数在hbase-site.xml在Phoenix的客户端。

<property>
  <name>phoenix.functions.allowUserDefinedFunctions</name>
  <value>true</value>
</property>
<property>
  <name>fs.hdfs.impl</name>
  <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>${hbase.tmp.dir}/hbase</value>
  <description>The directory shared by region servers and into
    which HBase persists.  The URL should be 'fully-qualified'
    to include the filesystem scheme.  For example, to specify the
    HDFS directory '/hbase' where the HDFS instance's namenode is
    running at namenode.example.org on port 9000, set this value to:
    hdfs://namenode.example.org:9000/hbase.  By default, we write
    to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
    so change this configuration or else all data will be lost on
    machine restart.</description>
</property>
<property>
  <name>hbase.dynamic.jars.dir</name>
  <value>${hbase.rootdir}/lib</value>
  <description>
    The directory from which the custom udf jars can be loaded
    dynamically by the phoenix client/region server without the need to restart. However,
    an already loaded udf class would not be un-loaded. See
    HBASE-1936 for more details.
  </description>
</property>

注意：最后两个参数配置，必须和hbase服务器配置一致。
与其他配置属性一样，phoenix.functions.allowUserDefinedFunctions这个属性可以在指定的JDBC连接时作为连接属性指定。

Properties props = new Properties();
props.setProperty("phoenix.functions.allowUserDefinedFunctions", "true");
Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost", props);

下面的动态类加载器拷贝jar包从hdfs到本地文件系统的参数是可选择的。

<property>
  <name>hbase.local.dir</name>
  <value>${hbase.tmp.dir}/local/</value>
  <description>Directory on the local filesystem to be used
    as a local storage.</description>
</property>

创建定制UDFs

去实现定制的UDF 你可以参考下面的步骤“如何编写定制UDF”。
当打包你的代码成jar包时，你必须将它部署在HDFS上。它将会更好的去添加jar包在HDFS目录配置在hbase的hbase.dynamic.jars.dir。
最后的步骤就是运行CREATE FUNCTION查询。

丢弃UDFs

你可以丢弃函数通过DROP FUNCTION查询语句。丢其函数会删除函数元数据，在phoneix中。

如何编写定制UDF

你可以跟随几个简单的步骤来写你的UDF（获取更多详细信息，请查看博客）:

创建一个派生至org.apache.phoenix.expression.function.ScalarFunction新类。
实现一个确定数据返回类型的getDataType方法。
实现计算每行结果所需调用的evalute方法。该方法传递一个org.apache.phoenix.schema.tuple.Tuple具有当前行的状态和一个org.apache.hadoop.hbase.io.ImmutableBytesWritable需要要填写以指向函数执行的结果。如果没有足够的信息可用来计算结果(通常是因为其中一个参数未知)，则该方法返回false，否则返回true。
下面是优化的其他步骤。
为了能够提供扫描的开始/停止键，自定义函数需要覆盖以下两个来自ScalarFunction的方法:

 /**
     * Determines whether or not a function may be used to form
     * the start/stop key of a scan
     * @return the zero-based position of the argument to traverse
     *  into to look for a primary key column reference, or
     *  {@value #NO_TRAVERSAL} if the function cannot be used to
     *  form the scan key.
     */
    public int getKeyFormationTraversalIndex() {
        return NO_TRAVERSAL;
    }

    /**
     * Manufactures a KeyPart used to construct the KeyRange given
     * a constant and a comparison operator.
     * @param childPart the KeyPart formulated for the child expression
     *  at the {@link #getKeyFormationTraversalIndex()} position.
     * @return the KeyPart for constructing the KeyRange for this
     *  function.
     */
    public KeyPart newKeyPart(KeyPart childPart) {
        return null;
    }

此外，为了使ORDER BY能够优化或GROUP BY能够就地完成：

 /**
     * Determines whether or not the result of the function invocation
     * will be ordered in the same way as the input to the function.
     * Returning YES enables an optimization to occur when a
     * GROUP BY contains function invocations using the leading PK
     * column(s).
     * @return YES if the function invocation will always preserve order for
     * the inputs versus the outputs and false otherwise, YES_IF_LAST if the
     * function preserves order, but any further column reference would not
     * continue to preserve order, and NO if the function does not preserve
     * order.
     */
    public OrderPreserving preservesOrder() {
        return OrderPreserving.NO;
    }

局限性

包含UDFs的jar包必须手动的添加，手动的删除在HDFS上。为添加/删除的jar添加新的SQL语句。Phoenix-1890
动态类加载器拷贝udf的jar包到{hbase.local.dir}/jars在phoneix客户端/区域服务器，当ufs被使用时。函数删除后，必须手动删除jar包。
如果函数实现更改，就必须手动重建函数索引。Phoenix-1907
加载后，jar包将不会被卸载，所以你需要修改后实现放在另外一个jar包中，防止出现回退在你的集群中Phoenix-1907。
要列出你需要查询SYSTEM.FUNCTION表的功能Phoenix-1921

网友评论

本文标题：Apache Phoenix(三)新特性之用户定义函数UDFS（

本文链接：https://www.haomeiwen.com/subject/sjduxhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Apache Phoenix(三)新特性之用户定义函数UDFS（

概述

配置

创建定制UDFs

丢弃UDFs

如何编写定制UDF

局限性

相关文章

Apache Phoenix(三)新特性之用户定义函数UDFS（

Apache Phoenix(三)新特性之用户定义函数UDFS（

Spark自定义函数（2）

Apache Phoenix（十九）新特性之指标

Apache Phoenix（十三）新特性之视图

Apache Phoenix(五)新特性之存储格式

Apache Phoenix（十二)新特性之数据抽样

Apache Phoenix（十五）新特性之动态列

Apache Phoenix(六)新特性之原子插入

Apache Phoenix（十六）新特性之块加载

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读