备注:
Hive 版本 2.1.1
一.Hive宏介绍
在编写HQL的过程中,很多逻辑需要反复使用。在关系型函数中,可以通过自定义函数来实现,Hive中实现的方法有多种。
1) Hive 宏
2) Hive hpl/sql
3) Hive UDF
这里,我们介绍实现最简单的一种Hive 宏。
语法:
-- 创建宏
CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;
-- 删除宏
DROP TEMPORARY MACRO [IF EXISTS] macro_name;
创建临时宏使用给定的可选列列表作为表达式的输入创建宏。宏在当前会话期间存在。
宏只能是临时宏,只在本次会话中可见、有效。因此你需要将宏脚本放在SQL脚本的头部。
如果函数不存在,DROP临时宏将返回错误,除非指定了if EXISTS。
二.Hive 宏案例
2.1 数据准备
degree_type
1 -- 小学
2 -- 初中
3 -- 职业高中
4 -- 中专
5 -- 高中
6 -- 大专
7 -- 本科
8 -- 硕士
8 -- 博士
create table user_info(id int,degree_type int);
insert into user_info values (1,3);
insert into user_info values (2,1);
insert into user_info values (3,6);
insert into user_info values (4,4);
insert into user_info values (5,5);
insert into user_info values (6,9);
insert into user_info values (7,8);
insert into user_info values (8,2);
insert into user_info values (9,7);
hive>
> select * from user_info;
OK
user_info.id user_info.degree_type
1 3
2 1
3 6
4 4
5 5
6 9
7 8
8 2
9 7
Time taken: 0.088 seconds, Fetched: 9 row(s)
此时需要展示的是学历的中文信息,而非枚举类型的1、2、3这些数字。
2.2 case when实现
代码:
select id,
case degree_type when 1 then '小学'
when 2 then '初中'
when 3 then '职业高中'
when 4 then '中专'
when 5 then '高中'
when 6 then '大专'
when 7 then '本科'
when 8 then '硕士'
when 9 then '博士'
else null
end as degree
from user_info
测试记录:
hive> select id,
> case degree_type when 1 then '小学'
> when 2 then '初中'
> when 3 then '职业高中'
> when 4 then '中专'
> when 5 then '高中'
> when 6 then '大专'
> when 7 then '本科'
> when 8 then '硕士'
> when 9 then '博士'
> else null
> end as degree
> from user_info ;
Query ID = root_20201217103340_b5a1b58f-de3e-4ed8-98c4-dbd079412474
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0237, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0237/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0237
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-17 10:33:47,506 Stage-1 map = 0%, reduce = 0%
2020-12-17 10:33:53,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.13 sec
MapReduce Total cumulative CPU time: 6 seconds 130 msec
Ended Job = job_1606698967173_0237
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 6.13 sec HDFS Read: 9479 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 130 msec
OK
id degree
1 职业高中
5 高中
6 博士
7 硕士
8 初中
2 小学
3 大专
4 中专
9 本科
Time taken: 14.239 seconds, Fetched: 9 row(s)
hive>
2.3 Hive宏实现
代码:
drop temporary macro if exists get_degree;
create temporary macro get_degree(degree_type string)
if (degree_type is not null,
case degree_type when 1 then '小学'
when 2 then '初中'
when 3 then '职业高中'
when 4 then '中专'
when 5 then '高中'
when 6 then '大专'
when 7 then '本科'
when 8 then '硕士'
when 9 then '博士'
else null
end,null);
select id,get_degree(degree_type) from user_info;
测试记录:
hive>
> create temporary macro get_degree(degree_type string)
> if (degree_type is not null,
> case degree_type when 1 then '小学'
> when 2 then '初中'
> when 3 then '职业高中'
> when 4 then '中专'
> when 5 then '高中'
> when 6 then '大专'
> when 7 then '本科'
> when 8 then '硕士'
> when 9 then '博士'
> else null
> end,null);
OK
Time taken: 0.032 seconds
hive>
> select id,get_degree(degree_type) from user_info;
Query ID = root_20201217104333_6ac59bfb-4915-48f8-965f-fae4642957c0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0238, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0238/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0238
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-17 10:43:42,748 Stage-1 map = 0%, reduce = 0%
2020-12-17 10:43:48,993 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 3.11 sec
2020-12-17 10:43:50,025 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.21 sec
MapReduce Total cumulative CPU time: 6 seconds 210 msec
Ended Job = job_1606698967173_0238
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 6.21 sec HDFS Read: 10077 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 210 msec
OK
id _c1
1 职业高中
5 高中
6 博士
7 硕士
8 初中
2 小学
3 大专
4 中专
9 本科
Time taken: 17.841 seconds, Fetched: 9 row(s)
hive> exit;
[root@hp1 ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> use test;
OK
Time taken: 1.204 seconds
hive> select id,get_degree(degree_type) from user_info;
FAILED: SemanticException [Error 10011]: Invalid function get_degree
hive>
测试记录可以看出,exit退出hive客户端后重新登陆,临时宏就不可用了,如果使用宏,需要在当前回话的最开始运行创建宏的语句。
网友评论