美文网首页
大数据开发之Hive篇6-Hive函数详解

大数据开发之Hive篇6-Hive函数详解

作者: 只是甲 | 来源:发表于2020-12-25 13:40 被阅读0次

备注:
Hive 版本 2.1.1

一.Hive函数概述

1.1 Hive函数分类

  1. 函数/UDF
    输入一行记录,输出一行记录
    示例:upper/lower/length

  2. 聚集函数/UDAF
    输入多行记录,输出一行记录
    示例:sum/count/avg

  3. 表生成函数/UDTF
    输入一行记录,输出多行记录
    示例:explode

1.2 查看Hive函数

1.2.1 show functions命令

show functions命令是查询当前hive支持哪些函数,可以看到hive总共支持248个函数,总体来说函数与MySQL的比较类似,但是也包含 to_date、decode等Oracle独有的函数(函数用法与Oracle存在差别)。真的是给从关系型数据库转到大数据库开发的人提供了便利。

hive> SHOW FUNCTIONS;
OK
!
!=
$sum0
%
&
*
+
-
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
and
array
array_contains
ascii
asin
assert_true
atan
avg
base64
between
bin
bround
case
cbrt
ceil
ceiling
chr
coalesce
collect_list
collect_set
compute_stats
concat
concat_ws
context_ngrams
conv
corr
cos
count
covar_pop
covar_samp
crc32
create_union
cume_dist
current_database
current_date
current_timestamp
current_user
date_add
date_format
date_sub
datediff
day
dayofmonth
dayofweek
decode
degrees
dense_rank
div
e
elt
encode
ewah_bitmap
ewah_bitmap_and
ewah_bitmap_empty
ewah_bitmap_or
exp
explode
factorial
field
find_in_set
first_value
floor
floor_day
floor_hour
floor_minute
floor_month
floor_quarter
floor_second
floor_week
floor_year
format_number
from_unixtime
from_utc_timestamp
get_json_object
get_splits
greatest
grouping
hash
hex
histogram_numeric
hour
if
in
in_file
index
initcap
inline
instr
internal_interval
isnotnull
isnull
java_method
json_tuple
lag
last_day
last_value
lcase
lead
least
length
levenshtein
like
ln
locate
log
log10
log2
logged_in_user
lower
lpad
ltrim
map
map_keys
map_values
mask
mask_first_n
mask_hash
mask_last_n
mask_show_first_n
mask_show_last_n
matchpath
max
md5
min
minute
month
months_between
named_struct
negative
next_day
ngrams
noop
noopstreaming
noopwithmap
noopwithmapstreaming
not
ntile
nvl
or
parse_url
parse_url_tuple
percent_rank
percentile
percentile_approx
pi
pmod
posexplode
positive
pow
power
printf
quarter
radians
rand
rank
reflect
reflect2
regexp
regexp_extract
regexp_replace
repeat
replace
reverse
rlike
round
row_number
rpad
rtrim
second
sentences
sha
sha1
sha2
shiftleft
shiftright
shiftrightunsigned
sign
sin
size
sort_array
soundex
space
split
sqrt
stack
std
stddev
stddev_pop
stddev_samp
str_to_map
struct
substr
substring
substring_index
sum
tan
to_date
to_unix_timestamp
to_utc_timestamp
translate
trim
trunc
ucase
unbase64
unhex
unix_timestamp
upper
uuid
var_pop
var_samp
variance
version
weekofyear
when
windowingtablefunction
xpath
xpath_boolean
xpath_double
xpath_float
xpath_int
xpath_long
xpath_number
xpath_short
xpath_string
year
|
~
Time taken: 1.246 seconds, Fetched: 248 row(s)
hive> 

1.2.2 DESCRIBE 命令

语法:

DESCRIBE FUNCTION <function_name>;
DESCRIBE FUNCTION EXTENDED <function_name>;

测试记录:

hive> 
    > 
    > desc function decode;
OK
decode(bin, str) - Decode the first argument using the second argument character set
Time taken: 0.023 seconds, Fetched: 1 row(s)
hive> desc function extended decode;
OK
decode(bin, str) - Decode the first argument using the second argument character set
Possible options for the character set are 'US-ASCII', 'ISO-8859-1',
'UTF-8', 'UTF-16BE', 'UTF-16LE', and 'UTF-16'. If either argument
is null, the result will also be null
Time taken: 0.027 seconds, Fetched: 4 row(s)
hive> 

二.Hive的数值函数

下列是Hive常见的数值函数:

函数名 描述 返回值类型
round(DOUBLE a)
round(DOUBLE a, INT d)
四舍五入函数 DOUBLE
floor(DOUBLE a) 返回小于等于a的整数 BIGINT
ceil(DOUBLE a), ceiling(DOUBLE a) 返回大于等于a的整数 BIGINT
rand(), rand(INT seed) 返回0 到1 之间的随机数 DOUBLE
pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p) 返回a的p次方 DOUBLE
sqrt(DOUBLE a), sqrt(DECIMAL a) 开方函数 DOUBLE
bin(BIGINT a) 返回二进制数 STRING
abs(DOUBLE a) 返回a的绝对值 DOUBLE
greatest(T v1, T v2, ...) 返回最大值 T
least(T v1, T v2, ...) 返回最小值 T

2.1 round函数

round函数,用于四舍五入。

测试记录:

hive> 
    > select round(5.1);
OK
5.0
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> select round(5.5);
OK
6.0
Time taken: 0.07 seconds, Fetched: 1 row(s)
hive> 
    > select round(5.5,1);
OK
5.5
Time taken: 0.086 seconds, Fetched: 1 row(s)

2.2 floor函数

floor向下取整

测试记录:

hive> 
    > 
    > select floor(5.4);
OK
5
Time taken: 0.092 seconds, Fetched: 1 row(s)
hive> select floor(5.5);
OK
5
Time taken: 0.074 seconds, Fetched: 1 row(s)
hive> select floor(5.9);
OK
5
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive> 

2.3 ceil函数

ceil 向上取整

测试记录:

hive> 
    > select ceil(5.4);
OK
6
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select ceil(5.1);
OK
6
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> select ceil(5.9);
OK
6
Time taken: 0.08 seconds, Fetched: 1 row(s)

2.4 rand()

rand 随机数函数

测试记录:

-- 取值1-100之间的随机数
hive> 
    > select round(100*rand());
Query ID = root_20201208110850_c85b86e9-012e-449d-a1f5-0b158741315b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0141, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0141/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0141
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-08 11:08:57,146 Stage-1 map = 0%,  reduce = 0%
2020-12-08 11:09:03,404 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.04 sec
MapReduce Total cumulative CPU time: 3 seconds 40 msec
Ended Job = job_1606698967173_0141
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.04 sec   HDFS Read: 5300 HDFS Write: 104 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 40 msec
OK
62.0
Time taken: 14.362 seconds, Fetched: 1 row(s)

2.5 power函数

power平方函数

测试记录:

-- 求2的3次方
hive> 
    > select power(2,3);
OK
8.0
Time taken: 0.087 seconds, Fetched: 1 row(s)

2.6 sqrt函数

sqrt开方函数

测试记录:

hive> 
    > select sqrt(9);
OK
3.0
Time taken: 0.074 seconds, Fetched: 1 row(s)

2.7 bin

bin函数,返回二进制数

测试记录:

hive> 
    > 
    > select bin(10);
OK
1010
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> select bin(9);
OK
1001
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> 

2.8 abs函数

abc函数求绝对值

测试记录:

hive> 
    > select abs(10);
OK
10
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> select abs(-10);
OK
10
Time taken: 0.084 seconds, Fetched: 1 row(s)

2.9 greatest函数

greatest函数,求最大值函数,如果有参数为null,就会返回null

测试记录:

hive> 
    > select greatest(1,2,3);
OK
3
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select greatest(1,2,3,1.5);
OK
3.0
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 
    > select greatest(1,2,3,null);
OK
NULL
Time taken: 0.084 seconds, Fetched: 1 row(s)

2.10 least函数

least函数求最小值,如参数中有null值,则返回null

测试记录:

hive> 
    > 
    > select least(1,2,3);
OK
1
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select least(1,2,3,-1);
OK
-1
Time taken: 0.079 seconds, Fetched: 1 row(s)
hive> select least(1,2,3,null);
OK
NULL
Time taken: 0.059 seconds, Fetched: 1 row(s)
hive> 

三.Hive的日期时间函数

下列是Hive常见的日期时间函数:

函数名 描述 返回值类型
from_unixtime(bigint unixtime[, string format]) 与UNIX_TIMESTAMP()结合使用,数值转换成日期时间类型 string
unix_timestamp()
unix_timestamp(string date)
unix_timestamp(string date, string pattern)
将时间转成数字,与FROM_UNIXTIME相互对应 bigint
to_date(string timestamp) 返回时间戳类型的日期时间部分 date
year(string date) 返回年份 int
quarter(date/timestamp/string) 返回季度 int
month(string date) 返回月份 int
day(string date) dayofmonth(date) 返回日 int
hour(string date) 返回小时 int
minute(string date) 返回分钟 int
second(string date) 返回秒 int
weekofyear(string date) 返回当年的星期数 int
extract(field FROM source) 提取日期的部分 int
datediff(string enddate, string startdate) 返回两个日期间隔的天数 int
date_add(date/timestamp/string startdate, tinyint/smallint/int days) 增加日期天数 date
date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 减去日期天数 date
from_utc_timestamp({any primitive type} ts, string timezone) 将UTC中的时间戳*转换为给定时区 timestamp
to_utc_timestamp({any primitive type} ts, string timezone) 将给定时区中的时间戳*转换为UTC timestamp
current_date 返回当前日期 date
current_timestamp 返回当前时间 timestamp
add_months(string start_date, int num_months, output_date_format) 增加月份 string
last_day(string date) 求本月最后一天 string
next_day(string start_date, string day_of_week) 返回比start_date晚并命名为day_of_week的第一个日期 string
trunc(string date, string format) 返回格式指定的单元截断的日期 string
months_between(date1, date2) 两个日期之间间隔的月份 double
date_format(date/timestamp/string ts, string fmt) 格式日期指定 string

3.1 from_unixtime和UNIX_TIMESTAMP

from_unixtime 将数字转成时间,与UNIX_TIMESTAMP相互对应
UNIX_TIMESTAMP将时间转成数字,与FROM_UNIXTIME相互对应
测试记录:

hive> 
    > select UNIX_TIMESTAMP('2020-12-09 23:50:59');
OK
1607529059
Time taken: 0.101 seconds, Fetched: 1 row(s)
hive> 
    > select from_unixtime(1607529059);
OK
2020-12-09 23:50:59
Time taken: 0.084 seconds, Fetched: 1 row(s)

3.2 to_date函数

to_date(string timestamp) 函数 (string timestamp)

hive> 
    > 
    > select to_date('2020-12-09 23:59:59 123456789');
OK
2020-12-09
Time taken: 0.109 seconds, Fetched: 1 row(s)
hive> select to_date('2020-12-09 23:59:59');
OK
2020-12-09
Time taken: 0.086 seconds, Fetched: 1 row(s)

3.3 year/quarter/month/day/hour/minute/second/weekofyear

  1. year(string date) 返回年份
  2. quarter(date/timestamp/string) 返回季度
  3. month(string date) 返回月份
  4. day(string date) dayofmonth(date) 返回日
  5. hour(string date) 返回小时
  6. minute(string date) 返回分钟
  7. second(string date) 返回秒

测试记录:

hive> 
    > select year(dt) dt_year,
    >        quarter(dt) dt_quarter,
    >        month(dt) dt_month,
    >        day(dt) dt_day,
    >        hour(dt) dt_hour,
    >        minute(dt) dt_minute,
    >        second(dt) dt_second
    > from
    > (
    > select '2020-12-09 23:50:59' as dt
    > ) tmp;
OK
dt_year dt_quarter      dt_month        dt_day  dt_hour dt_minute       dt_second
2020    4       12      9       23      50      59
Time taken: 0.098 seconds, Fetched: 1 row(s)
hive> 

3.4 weekofyear 函数

weekofyear(string date) 返回当年的星期数

测试记录:
'2020-12-09'是本年的第50周

hive> 
    > select weekofyear('2020-12-09');
OK
_c0
50
Time taken: 0.073 seconds, Fetched: 1 row(s)

3.5 extract 函数

extract(field FROM source) 提取日期的部分
代码:

-- 提取年
SELECT EXTRACT(YEAR FROM '2020-12-09');
-- 提取月
SELECT EXTRACT(MONTH FROM '2020-12-09');
-- 提取日
SELECT EXTRACT(DAY FROM '2020-12-09');
-- 提取小时
SELECT EXTRACT(HOUR FROM '2020-12-09 13:02:01');
-- 提取分钟
SELECT EXTRACT(MINUTE FROM '2020-12-09 13:02:01');
-- 提取秒
SELECT EXTRACT(SECOND FROM '2020-12-09 13:02:01');

测试记录:

hive> 
    > SELECT EXTRACT(YEAR FROM '2020-12-09');
OK
_c0
2020
Time taken: 1.85 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(MONTH FROM '2020-12-09');
OK
_c0
12
Time taken: 0.227 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(DAY FROM '2020-12-09');
OK
_c0
9
Time taken: 0.089 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(HOUR FROM '2020-12-09 13:02:01');
OK
_c0
13
Time taken: 0.097 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(MINUTE FROM '2020-12-09 13:02:01');
OK
_c0
2
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(SECOND FROM '2020-12-09 13:02:01');
OK
_c0
1

3.6 datediff 函数

datediff(string enddate, string startdate) 返回两个日期间隔的天数

测试记录:

hive> 
    > 
    > select datediff('2020-12-09','2020-11-09');
OK
_c0
30
Time taken: 0.101 seconds, Fetched: 1 row(s)
hive> select datediff('2020-12-09 23:50:59','2020-11-09');
OK
_c0
30
Time taken: 0.113 seconds, Fetched: 1 row(s)

3.6 date_add/date_sub 函数

date_add(date/timestamp/string startdate, tinyint/smallint/int days) 增加日期间隔,默认为天
date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 减去日期间隔,默认为天

测试记录:

hive> 
    > 
    > SELECT DATE_ADD('2020-01-01', 31);
OK
_c0
2020-02-01
Time taken: 0.098 seconds, Fetched: 1 row(s)
hive> SELECT DATE_ADD('2020-01-01', -31);
OK
_c0
2019-12-01
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 
    > SELECT DATE_SUB('2020-01-01', 31);
OK
_c0
2019-12-01
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> SELECT DATE_SUB('2020-01-01', -31);
OK
_c0
2020-02-01
Time taken: 0.071 seconds, Fetched: 1 row(s)

3.7 from_utc_timestamp/to_utc_timestamp 函数

from_utc_timestamp({any primitive type} ts, string timezone) 将UTC中的时间戳转换为给定时区
to_utc_timestamp({any primitive type} ts, string timezone) 将给定时区中的时间戳
转换为UTC

测试记录:
CST是美国时间简写
UTC的

hive> 
    > select from_utc_timestamp('2020-12-09 23:59:59','CST');
OK
_c0
2020-12-09 17:59:59
Time taken: 0.078 seconds, Fetched: 1 row(s)
hive> 
    > select to_utc_timestamp('2020-12-09 23:59:59','CST');
OK
_c0
2020-12-10 05:59:59
Time taken: 0.085 seconds, Fetched: 1 row(s)

3.8 current_date/current_timestamp

current_date 返回当前日期
current_timestamp 返回当前时间

测试记录:

hive> 
    > select current_date() cur_date,current_timestamp() cur_timestamp;
OK
cur_date        cur_timestamp
2020-12-09      2020-12-09 14:58:44.535
Time taken: 0.079 seconds, Fetched: 1 row(s)

3.9 add_months 函数

add_months(string start_date, int num_months, output_date_format) 增加月份

测试记录:

hive> 
    > 
    > select add_months('2020-12-31 23:59:59',2,'YYYY-MM-dd HH:mm:ss');
OK
_c0
2021-02-28 23:59:59
Time taken: 0.097 seconds, Fetched: 1 row(s)

3.10 last_day 函数

last_day(string date) 求本月最后一天

测试记录:

hive> 
    > select last_day('2020-12-09');
OK
_c0
2020-12-31
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> select last_day('2020-12-09 23:59:59');
OK
_c0
2020-12-31
Time taken: 0.077 seconds, Fetched: 1 row(s)

3.10 next_day函数

next_day(string start_date, string day_of_week) 返回比start_date晚并命名为day_of_week的第一个日期

测试记录:
求下一个礼拜二

hive> 
    > 
    > select next_day('2020-12-09','TU');
OK
_c0
2020-12-15
Time taken: 0.086 seconds, Fetched: 1 row(s)

3.11 trunc 函数

trunc(string date, string format) 返回格式指定的单元截断的日期
支持格式:MONTH/MON/MM, YEAR/YYYY/YY

测试记录:

hive> 
    > select trunc('2020-12-09 23:59:59','YYYY');
OK
_c0
2020-01-01
hive> 
    > select trunc('2020-12-09 23:59:59','MM');
OK
_c0
2020-12-01
Time taken: 0.063 seconds, Fetched: 1 row(s)

3.12 months_between 函数

months_between(date1, date2) 两个日期之间间隔的月份

date1和date2之间的月份间隔,会有小数的存在,另外,如果date1小于date2,那么就会出现负数

测试记录:

hive> 
    > 
    > select months_between('2020-12-09','2020-01-01');
OK
_c0
11.25806452
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> 
    > select months_between('2020-12-09','2021-01-01');
OK
_c0
-0.74193548
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> 

3.13 date_format 函数

date_format(date/timestamp/string ts, string fmt) 格式日期指定

测试记录:

hive> SELECT date_format('2020-12-09 23:59:59', 'yyyy.MM.dd HH:mm:ss');
OK
_c0
2020.12.09 23:59:59
Time taken: 0.071 seconds, Fetched: 1 row(s)

四.条件函数

下列是Hive常见的条件函数:

函数名 描述 返回值类型
nvl(a,b) 当a为null的时候返回b,否则返回a T
isnull( a ) 当a为null的时候返回true,否则返回false boolean
isnotnull ( a ) 当a为null的时候返回false,否则返回true boolean
if(boolean testCondition, T valueTrue, T valueFalseOrNull) 当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull) T
COALESCE(T v1, T v2, ...) 返回第一个不为null,当所有的都是null,就返回null T
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END case条件判断语句 T
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END case值判断语句 T

4.1 nvl函数

nvl(a,b) 当a为null的时候返回b,否则返回a

测试记录:

hive> 
    > select nvl(null,123);
OK
_c0
123
hive> 
    > select nvl('345',123);
OK
_c0
345
Time taken: 0.067 seconds, Fetched: 1 row(s)

4.2 isnull/isnotnull 函数

isnull( a ) 当a为null的时候返回true,否则返回false
isnotnull ( a ) 当a为null的时候返回false,否则返回true

测试记录:

hive> 
    > select isnull(1);
OK
_c0
false
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> select isnull(null);
OK
_c0
true
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> select isnull('');
OK
_c0
false
Time taken: 0.076 seconds, Fetched: 1 row(s)
hive> 
    > select isnull(' ');
OK
_c0
false
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive> 
    > 
    > select isnotnull(1);
OK
_c0
true
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull(null);
OK
_c0
false
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull('');
OK
_c0
true
Time taken: 0.071 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull(' ');
OK
_c0
true
Time taken: 0.086 seconds, Fetched: 1 row(s)
hive> 

4.3 if函数

if(boolean testCondition, T valueTrue, T valueFalseOrNull) 当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull)

测试记录:

hive> 
    > select if(1>2,'abc','def');
OK
_c0
def
Time taken: 0.134 seconds, Fetched: 1 row(s)
hive> 
    > select if(1<2,'abc','def');
OK
_c0
abc
Time taken: 0.1 seconds, Fetched: 1 row(s)

4.4 COALESCE函数

COALESCE(T v1, T v2, ...) 返回第一个不为null,当所有的都是null,就返回null

测试记录:

hive> 
    > select coalesce('a','b',null);
OK
_c0
a
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> 
    > select coalesce('null','b','c');
OK
_c0
null
Time taken: 0.073 seconds, Fetched: 1 row(s)
hive> 
    > select coalesce('null',null,null);
OK
_c0
null
Time taken: 0.073 seconds, Fetched: 1 row(s)

4.5 CASE 函数

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END case条件判断语句
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END case值判断语句

代码:

-- 值判断,根据职位英文名判断职位中文名
select e.empno,
     e.job,
     case when e.job = 'SALESMAN' then '销售员'
          when e.job = 'MANAGER' then '管理人员'
          else '未知' end  as job_china
from emp e;

-- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工
select e.empno,
     e.hiredate,
     case e.job when  'SALESMAN' then '销售员'
                when  'MANAGER' then '管理人员'
                 else '未知' 
     end  as job_china
from emp e;

-- 没有else语句且判断不成功的时候,会返回为空
SELECT CASE 'c'
    WHEN 'a' THEN 1 WHEN 'b' THEN 2 END is_c;

测试记录:

hive> 
    > -- 值判断,根据职位英文名判断职位中文名
    > select e.empno,
    >      e.job,
    >      case when e.job = 'SALESMAN' then '销售员'
    >           when e.job = 'MANAGER' then '管理人员'
    >           else '未知' end  as job_china
    > from emp e;
Query ID = root_20201209163913_c6290550-dfa0-4f82-891e-24673ce9c704
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0142, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0142/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0142
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-09 16:39:21,669 Stage-1 map = 0%,  reduce = 0%
2020-12-09 16:39:27,895 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.68 sec
MapReduce Total cumulative CPU time: 6 seconds 680 msec
Ended Job = job_1606698967173_0142
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.68 sec   HDFS Read: 12384 HDFS Write: 764 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 680 msec
OK
e.empno e.job   job_china
7499    SALESMAN        销售员
7876    CLERK   未知
7934    CLERK   未知
7521    SALESMAN        销售员
7654    SALESMAN        销售员
7698    MANAGER 管理人员
7788    ANALYST 未知
7369    CLERK   未知
7900    CLERK   未知
7902    ANALYST 未知
7566    MANAGER 管理人员
7782    MANAGER 管理人员
7839    PRESIDENT       未知
7844    SALESMAN        销售员
Time taken: 15.553 seconds, Fetched: 14 row(s)
hive> 
    > -- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工
    > select e.empno,
    >      e.hiredate,
    >      case e.job when  'SALESMAN' then '销售员'
    >                 when  'MANAGER' then '管理人员'
    >                  else '未知' 
    >      end  as job_china
    > from emp e;
Query ID = root_20201209164150_7030d7f3-686c-4bb8-85e7-4f4e1db2dfef
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0143, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0143/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0143
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-09 16:41:57,812 Stage-1 map = 0%,  reduce = 0%
2020-12-09 16:42:05,052 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.23 sec
MapReduce Total cumulative CPU time: 6 seconds 230 msec
Ended Job = job_1606698967173_0143
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.23 sec   HDFS Read: 12252 HDFS Write: 808 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 230 msec
OK
e.empno e.hiredate      job_china
7499    1981-02-20      销售员
7876    1987-06-13      未知
7934    1982-01-23      未知
7521    1981-02-22      销售员
7654    1981-09-28      销售员
7698    1981-05-01      管理人员
7788    1987-06-13      未知
7369    1980-12-17      未知
7900    1981-12-03      未知
7902    1981-12-03      未知
7566    1981-04-02      管理人员
7782    1981-06-09      管理人员
7839    1981-11-17      未知
7844    1981-09-08      销售员
Time taken: 15.383 seconds, Fetched: 14 row(s)
hive> -- 没有else语句且判断不成功的时候,会返回为空
hive> SELECT CASE 'c'
    >     WHEN 'a' THEN 1 WHEN 'b' THEN 2 END is_c;
OK
is_c
NULL
Time taken: 0.082 seconds, Fetched: 1 row(s)
hive> 

五.字符函数

下列是Hive常见的字符函数:

函数名 描述 返回值类型
ascii(string str) 返回字符的ascii值 int
chr(bigint or double A) 通过ascii值返回对应字符 string
concat(string or binary A, string or binary B...) 字符连接函数 string
concat_ws(string SEP, string A, string B...) 字符连接函数,可以加连接符 string
decode(binary bin, string charset) 将二进制文件转字符集 string
elt(N int,str1 string,str2 string,str3 string,...) 返回指定的字符 string
field(val T,val1 T,val2 T,val3 T,...) 返回字符在参数中出现的位置 int
find_in_set(string str, string strList) 返回字符在字符串中出现的位置 int
instr(string str, string substr) 返回substr在str中出现的位置 int
length(string A) 字符串长度长度函数 int
locate(string substr, string str[, int pos]) 返回在位置pos后的str中substr第一次出现的位置。 int
lower(string A) lcase(string A) 将字符转为小写 string
lpad(string str, int len, string pad) 字符填充函数,向左填充 string
ltrim(string A) 截断字符串左边的空格 string
repeat(string str, int n) 字符str重复n次 string
replace(string A, string OLD, string NEW) 字符串替换函数 string
reverse(string A) 字符反转函数 string
rpad(string str, int len, string pad) 字符填充函数,从右边填充 string
rtrim(string A) 截断字符串右边的空格 string
space(int n) 空格函数,返回n个空格 string
substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len)
字符截取函数 string
translate(string or char or varchar input, string or char or varchar from, string or char or varchar to) 同postgresql的translate函数 string
trim(string A) 截断字符串左边和右边的空格 string
upper(string A) ucase(string A) 字符串大写函数 string

5.1 ascii值函数

ascii(string str) 返回字符的ascii值
chr(bigint or double A) 通过ascii值返回对应字符

这两个函数刚好是相对应的。

测试记录:

hive> 
    > select ascii('a') ascii_a,ascii('A') ascii_a;
OK
ascii_a ascii_a
97      65
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> 
    > select chr(97) chr_97,chr(65) chr_65;
OK
chr_97  chr_65
a       A
Time taken: 0.075 seconds, Fetched: 1 row(s)

5.2 字符连接函数

concat(string or binary A, string or binary B...) 字符连接函数
concat_ws(string SEP, string A, string B...) 字符连接函数,可以加连接符

测试记录:

hive> 
    > 
    > select concat('A','b','DEF');
OK
_c0
AbDEF
Time taken: 0.214 seconds, Fetched: 1 row(s)
hive> 
    > select concat_ws('-','A','b','DEF');
OK
_c0
A-b-DEF
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> 

5.3 大小写函数

lower(string A) lcase(string A) 将字符转为小写
upper(string A) ucase(string A) 字符串大写函数

测试记录:

hive> 
    > select lower('AbCdE'),upper('AbCdE');
OK
_c0     _c1
abcde   ABCDE
Time taken: 0.085 seconds, Fetched: 1 row(s)

5.4 字符填充函数

lpad(string str, int len, string pad) 字符填充函数,向左填充
rpad(string str, int len, string pad) 字符填充函数,从右边填充

测试记录:

hive> 
    > SELECT LPAD('hi',4,'??');
OK
_c0
??hi
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> SELECT LPAD('hi',1,'??');
OK
_c0
h
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> SELECT RPAD('hi',4,'??');
OK
_c0
hi??
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> SELECT RPAD('hi',1,'??');
OK
_c0
h
Time taken: 0.08 seconds, Fetched: 1 row(s)

5.5 空格函数

LTRIM() 左边去掉空格
TRIM() 去掉空格
RTRIM() 右边去掉空格
SPACE() 返回指定长度的空格

如果中间有空格需要去掉,可以使用replace

代码:

-- 去掉左边、两端、右边的空格
select ltrim(' abc def '),trim(' abc def '),rtrim(' abc def ');
-- 返回10个空格
select SPACE(10);

测试记录:

hive> select ltrim(' abc def '),trim(' abc def '),rtrim(' abc def ');
OK
_c0     _c1     _c2
abc def         abc def  abc def
Time taken: 0.077 seconds, Fetched: 1 row(s)
hive> select SPACE(10);
OK
_c0
          
Time taken: 0.062 seconds, Fetched: 1 row(s)

5.6 字符长度函数

length(string A) 字符串长度长度函数

测试记录:

hive> 
    > select length('aaaaaabbbbcccc');
OK
_c0
14
Time taken: 0.081 seconds, Fetched: 1 row(s)

5.7 字符截取函数

SUBSTR() 截取字符
SUBSTRING_INDEX() 截取字符
INSTR() 字符出现位置函数

语法:
substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len)
INSTR(str,substr)
locate(string substr, string str[, int pos])

代码:

-- 字符截取,从第5位开始截,len没有值,表示截取到最后
select substr('abcdefghi',5);

-- 字符截取,从第5位开始截,截取2个字符
select substr('abcdefghi',5,2);

-- 字符截取,从第-5位开始截,截取2个字符
select substr('abcdefghi',-5,2);



-- '.'第一次出现的地方,保留左边的
select substring_index('www.mysql.com','.',1);

-- '.'最后一次出现的地方,保留右边的
select substring_index('www.mysql.com','.',-1);

-- '.'出现的位置
select instr('www.mysql.com','.');

-- '.'在第6个字符之后第一次出现的位置
select locate('.','hive.apache.org',6);

测试记录:

hive> 
    > 
    > select substr('abcdefghi',5);
OK
_c0
efghi
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select substr('abcdefghi',5,2);
OK
_c0
ef
Time taken: 0.066 seconds, Fetched: 1 row(s)
hive> select substr('abcdefghi',-5,2);
OK
_c0
ef
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> select substring_index('www.mysql.com','.',1);
OK
_c0
www
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select substring_index('www.mysql.com','.',-1);
OK
_c0
com
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select instr('www.mysql.com','.');
OK
_c0
4
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select locate('.','hive.apache.org',6);
OK
_c0
12
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> 

5.8 字符替换函数

replace(string A, string OLD, string NEW) 字符串替换函数
translate(string or char or varchar input, string or char or varchar from, string or char or varchar to) 同postgresql的translate函数

replace 会把string OLD当做一个整体来进行替换
translate 会把string or char or varchar from拆分成一个一个的字符,与string to一个一个对应进行替换

translate比replace功能更强大
translate(‘123abc’,‘2dc’,‘4e’): 将会把2->4,d->e,c-’’
translate(‘123abc’,‘2’,‘4e’): 将会把2->4

测试记录:

hive> 
    > SELECT REPLACE('acdd','cd','ef');
OK
_c0
aefd
Time taken: 0.093 seconds, Fetched: 1 row(s)
hive> SELECT TRANSLATE('acdd','cd','ef');
OK
_c0
aeff
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> select translate('123abc','2dc','4e');
OK
_c0
143ab
Time taken: 0.073 seconds, Fetched: 1 row(s)
hive> select translate('123abc','2','4e');
OK
_c0
143abc
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 

5.9 其它函数

decode(binary bin, string charset) 将二进制文件转字符集
elt(N int,str1 string,str2 string,str3 string,...) 返回指定的字符
field(val T,val1 T,val2 T,val3 T,...) 返回字符在参数中出现的位置
repeat(string str, int n) 字符str重复n次
reverse(string A) 字符反转函数

从Oracle开发转过来的同学看到decode这个函数会觉得特别亲切,不过还是得把Hive的decode语法看一下,这两者完全不是一回事,这个decode是转字符集的。

测试记录:

hive> 
    > select reflect('java.net.URLDecoder', 'decode','1' , "GBK");
Query ID = root_20201209172130_cafe84ad-3e66-4e4a-92a0-3868efd52957
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0145, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0145/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0145
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-09 17:21:39,319 Stage-1 map = 0%,  reduce = 0%
2020-12-09 17:21:45,570 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.88 sec
MapReduce Total cumulative CPU time: 2 seconds 880 msec
Ended Job = job_1606698967173_0145
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.88 sec   HDFS Read: 4726 HDFS Write: 101 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 880 msec
OK
_c0
1
Time taken: 15.797 seconds, Fetched: 1 row(s)
hive> 
    > select elt(3,'abc','aaa','def','test');
OK
_c0
def
Time taken: 0.084 seconds, Fetched: 1 row(s)
hive> 
    > select field('ab','abc','ba','ab','dab');
OK
_c0
3
hive> 
    > 
    > select repeat('abc',10);
OK
_c0
abcabcabcabcabcabcabcabcabcabc
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> 
    > select reverse('abc');
OK
_c0
cba
Time taken: 0.078 seconds, Fetched: 1 row(s)

六.UDTF

UDTF 是表生成函数

这点与关系型数据库的区别比较大,下面通过几个例子来看看

代码:

-- explode (array)
select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
-- explode (map)
select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

测试记录:

hive> 
    > 
    > 
    > select explode(array('A','B','C'));
Query ID = root_20201210161223_3527486d-76e5-4306-b3aa-8521f89774d9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0150, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0150/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0150
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:12:30,620 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:12:37,839 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.92 sec
MapReduce Total cumulative CPU time: 2 seconds 920 msec
Ended Job = job_1606698967173_0150
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.92 sec   HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 920 msec
OK
col
A
B
C
Time taken: 15.401 seconds, Fetched: 3 row(s)
hive> select explode(array('A','B','C')) as col;
Query ID = root_20201210161249_acc85662-e686-4176-a5e8-032ed0992a45
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0151, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0151/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0151
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:12:56,243 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:02,434 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.8 sec
MapReduce Total cumulative CPU time: 2 seconds 800 msec
Ended Job = job_1606698967173_0151
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.8 sec   HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 800 msec
OK
col
A
B
C
Time taken: 14.35 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
Query ID = root_20201210161305_5012fe10-4599-4ffe-838f-b867d212a10a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0152, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0152/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0152
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:13:14,529 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:20,716 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.86 sec
MapReduce Total cumulative CPU time: 2 seconds 860 msec
Ended Job = job_1606698967173_0152
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.86 sec   HDFS Read: 5764 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
tf.col
A
B
C
Time taken: 16.663 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
Query ID = root_20201210161324_8f1bcaa5-931d-49c1-8dbd-5da936bfa67f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0153, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0153/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0153
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:13:32,968 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:40,179 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.99 sec
MapReduce Total cumulative CPU time: 2 seconds 990 msec
Ended Job = job_1606698967173_0153
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.99 sec   HDFS Read: 5778 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 990 msec
OK
tf.col
A
B
C
Time taken: 17.102 seconds, Fetched: 3 row(s)
hive> select explode(map('A',10,'B',20,'C',30));
Query ID = root_20201210161354_1c522046-cac8-4182-9e28-9789a17c1a98
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0154, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0154/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0154
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:00,978 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:08,199 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.96 sec
MapReduce Total cumulative CPU time: 2 seconds 960 msec
Ended Job = job_1606698967173_0154
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.96 sec   HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 960 msec
OK
key     value
A       10
B       20
C       30
Time taken: 15.114 seconds, Fetched: 3 row(s)
hive> select explode(map('A',10,'B',20,'C',30)) as (key,value);
Query ID = root_20201210161411_59b615ac-2733-40c2-bec4-1a17f593140f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0155, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0155/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0155
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:19,606 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:25,805 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1606698967173_0155
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.95 sec   HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 950 msec
OK
key     value
A       10
B       20
C       30
Time taken: 15.767 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
Query ID = root_20201210161428_5b45482b-39c0-4a6f-9e67-ec640e03d816
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0156, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0156/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0156
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:38,792 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:46,098 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.92 sec
MapReduce Total cumulative CPU time: 2 seconds 920 msec
Ended Job = job_1606698967173_0156
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.92 sec   HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 920 msec
OK
tf.key  tf.value
A       10
B       20
C       30
Time taken: 18.456 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
Query ID = root_20201210161448_4ebe123f-57b7-4869-9e0e-557cccb27a10
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0157, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0157/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0157
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:57,112 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:15:03,299 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.74 sec
MapReduce Total cumulative CPU time: 2 seconds 740 msec
Ended Job = job_1606698967173_0157
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.74 sec   HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 740 msec
OK
tf.key  tf.value
A       10
B       20
C       30
Time taken: 16.725 seconds, Fetched: 3 row(s)
hive> 

参考

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions

相关文章

网友评论

      本文标题:大数据开发之Hive篇6-Hive函数详解

      本文链接:https://www.haomeiwen.com/subject/ftzgiktx.html