美文网首页
大数据开发之Hive篇9-Hive宏(Macro)

大数据开发之Hive篇9-Hive宏(Macro)

作者: 只是甲 | 来源:发表于2020-12-30 09:58 被阅读0次

    备注:
    Hive 版本 2.1.1

    一.Hive宏介绍

    在编写HQL的过程中,很多逻辑需要反复使用。在关系型函数中,可以通过自定义函数来实现,Hive中实现的方法有多种。
    1) Hive 宏
    2) Hive hpl/sql
    3) Hive UDF

    这里,我们介绍实现最简单的一种Hive 宏。

    语法:

    -- 创建宏
    CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;
    -- 删除宏
    DROP TEMPORARY MACRO [IF EXISTS] macro_name;
    

    创建临时宏使用给定的可选列列表作为表达式的输入创建宏。宏在当前会话期间存在。
    宏只能是临时宏,只在本次会话中可见、有效。因此你需要将宏脚本放在SQL脚本的头部。
    如果函数不存在,DROP临时宏将返回错误,除非指定了if EXISTS。

    二.Hive 宏案例

    2.1 数据准备

    degree_type
    1  -- 小学
    2  -- 初中
    3  -- 职业高中
    4  -- 中专
    5  -- 高中
    6  -- 大专
    7  -- 本科
    8  -- 硕士
    8 -- 博士
    
    
    create table user_info(id int,degree_type int);
    insert into user_info values (1,3);
    insert into user_info values (2,1);
    insert into user_info values (3,6);
    insert into user_info values (4,4);
    insert into user_info values (5,5);
    insert into user_info values (6,9);
    insert into user_info values (7,8);
    insert into user_info values (8,2);
    insert into user_info values (9,7);
    
    
    hive> 
        > select * from user_info;
    OK
    user_info.id    user_info.degree_type
    1       3
    2       1
    3       6
    4       4
    5       5
    6       9
    7       8
    8       2
    9       7
    Time taken: 0.088 seconds, Fetched: 9 row(s)
    

    此时需要展示的是学历的中文信息,而非枚举类型的1、2、3这些数字。

    2.2 case when实现

    代码:

    select id,
           case degree_type when 1 then '小学'
                            when 2 then '初中'
                            when 3 then '职业高中'
                            when 4 then '中专'
                            when 5 then '高中'
                            when 6 then '大专'
                            when 7 then '本科'
                            when 8 then '硕士'
                            when 9 then '博士'
                            else null
         end as degree 
      from user_info 
    

    测试记录:

    hive> select id,
        >        case degree_type when 1 then '小学'
        >                         when 2 then '初中'
        >                         when 3 then '职业高中'
        >                         when 4 then '中专'
        >                         when 5 then '高中'
        >                         when 6 then '大专'
        >                         when 7 then '本科'
        >                         when 8 then '硕士'
        >                         when 9 then '博士'
        >                         else null
        >      end as degree 
        >   from user_info ;
    Query ID = root_20201217103340_b5a1b58f-de3e-4ed8-98c4-dbd079412474
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1606698967173_0237, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0237/
    Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0237
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
    2020-12-17 10:33:47,506 Stage-1 map = 0%,  reduce = 0%
    2020-12-17 10:33:53,690 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.13 sec
    MapReduce Total cumulative CPU time: 6 seconds 130 msec
    Ended Job = job_1606698967173_0237
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 2   Cumulative CPU: 6.13 sec   HDFS Read: 9479 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
    Total MapReduce CPU Time Spent: 6 seconds 130 msec
    OK
    id      degree
    1       职业高中
    5       高中
    6       博士
    7       硕士
    8       初中
    2       小学
    3       大专
    4       中专
    9       本科
    Time taken: 14.239 seconds, Fetched: 9 row(s)
    hive> 
    

    2.3 Hive宏实现

    代码:

    drop temporary macro if exists get_degree;
    
    create temporary macro get_degree(degree_type string)
    if (degree_type is not null,
       case degree_type when 1 then '小学'
                            when 2 then '初中'
                            when 3 then '职业高中'
                            when 4 then '中专'
                            when 5 then '高中'
                            when 6 then '大专'
                            when 7 then '本科'
                            when 8 then '硕士'
                            when 9 then '博士'
                            else null
         end,null);
    
    select id,get_degree(degree_type) from user_info;
    

    测试记录:

    hive> 
        > create temporary macro get_degree(degree_type string)
        > if (degree_type is not null,
        >    case degree_type when 1 then '小学'
        >                         when 2 then '初中'
        >                         when 3 then '职业高中'
        >                         when 4 then '中专'
        >                         when 5 then '高中'
        >                         when 6 then '大专'
        >                         when 7 then '本科'
        >                         when 8 then '硕士'
        >                         when 9 then '博士'
        >                         else null
        >      end,null);
    OK
    Time taken: 0.032 seconds
    hive> 
        > select id,get_degree(degree_type) from user_info;
    Query ID = root_20201217104333_6ac59bfb-4915-48f8-965f-fae4642957c0
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1606698967173_0238, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0238/
    Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0238
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
    2020-12-17 10:43:42,748 Stage-1 map = 0%,  reduce = 0%
    2020-12-17 10:43:48,993 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 3.11 sec
    2020-12-17 10:43:50,025 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.21 sec
    MapReduce Total cumulative CPU time: 6 seconds 210 msec
    Ended Job = job_1606698967173_0238
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 2   Cumulative CPU: 6.21 sec   HDFS Read: 10077 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
    Total MapReduce CPU Time Spent: 6 seconds 210 msec
    OK
    id      _c1
    1       职业高中
    5       高中
    6       博士
    7       硕士
    8       初中
    2       小学
    3       大专
    4       中专
    9       本科
    Time taken: 17.841 seconds, Fetched: 9 row(s)
    hive> exit;
    [root@hp1 ~]# hive
    WARNING: Use "yarn jar" to launch YARN applications.
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false
    
    WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
    hive> use test;
    OK
    Time taken: 1.204 seconds
    hive> select id,get_degree(degree_type) from user_info;
    FAILED: SemanticException [Error 10011]: Invalid function get_degree
    hive> 
    

    测试记录可以看出,exit退出hive客户端后重新登陆,临时宏就不可用了,如果使用宏,需要在当前回话的最开始运行创建宏的语句。

    参考:

    1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/DropMacro

    相关文章

      网友评论

          本文标题:大数据开发之Hive篇9-Hive宏(Macro)

          本文链接:https://www.haomeiwen.com/subject/ndcdwktx.html