Hive常用函数总结

作者: 夜希辰 | 来源:发表于2020-10-06 22:42 被阅读0次

目录：
一、关系运算
二、数学运算
三、逻辑运算
四、复杂的数据类型 array、map、struct
五、复杂类型访问操作
六、复杂类型长度统计函数
七、复合类型构造函数 map struct array
八、类型转换函数
九、日期函数
十、数值计算函数
十一、条件函数
十二、字符串函数
十三、混合函数
十四、汇总统计函数（UDAF）
十五、常用函数

查看hive内置函数

show functions;

查看某个函数用法：

//查看coalesce函数用法
desc function extended coalesce;

一、关系运算：

等值比较: =
语法：A=B
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE

举例：
select * from person where 1=1;
select * from person where 1=2;

等值比较:<=>

语法：<=>
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE
说明：作用于 =相同

举例：
select * from person where 1<=>1;
select * from person where 1<=>2;

不等值比较: <>和!=
语法: A <> B A != B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A与表达式B不相等，则为TRUE；否则为FALSE

举例：
select * from person where 1<>2;
select * from person where 1<>1;
select * from person where null<>null;---无查询结果
select * from person where 1 != 1;
select * from person where 1 != 2;
select * from person where null != null;---无查询结果

小于比较: <
语法: A < B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1<2;---有查询结果
select * from person where 2<1; ---无查询结果
select * from person where null<null;---无查询结果

小于等于比较: <=
语法: A <= B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于或者等于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1<= 2;---有查询结果
select * from person where 2<= 1; ---无查询结果
select * from person where null<=null;---无查询结果

大于比较: >
语法: A > B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1> 2;---无查询结果
select * from person where 2 >1; ---有查询结果
select * from person where null>null;---无查询结果

大于等于比较: >=
语法: A >= B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于或者等于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1>= 2;---无查询结果
select * from person where 2 >=1; ---有查询结果
select * from person where 1>=1; ---有查询结果
select * from person where null>= null;---无查询结果

区间比较
空值判断: IS NULL
语法: A IS NULL
操作类型:所有类型
描述:如果表达式A的值为NULL，则为TRUE；否则为FALSE

举例：
select * from person where 1 is null;---无查询结果
select * from person where null is null;---有查询结果

非空判断: IS NOT NULL
语法: A IS NOT NULL
操作类型:所有类型
描述:如果表达式A的值为NULL，则为FALSE；否则为TRUE

举例：
select * from person where 1 IS NOT NULL;---有查询结果
select * from person where null IS NOT NULL;---无查询结果

LIKE比较: LIKE
语法: A LIKE B
操作类型: strings
描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B 的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。

举例：
select1 from person where 'football' like 'foot%';

JAVA的LIKE操作: RLIKE
语法: A RLIKE B
操作类型: strings
描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。

举例：
select 1 from person where '123456' rlike '^\\d+$';----判断一个字符串是否全为数字
select 1 from person where '12aa456' rlike '^\\d+$';

REGEXP操作: REGEXP
语法: A REGEXP B
操作类型: strings
描述:功能与RLIKE相同

举例：
select 1 from person where 'footbar' REGEXP '^f.*r$';---有查询结果

二、数学运算：

加法操作: +
语法: A + B
操作类型：所有数值类型
说明：返回A与B相加的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int + int 一般结果为int类型，而int + double一般结果为double类型

举例：
select 1+2 from person;

减法操作: –
语法: A– B
操作类型：所有数值类型
说明：返回A与B相减的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int– int 一般结果为int类型，而int– double 一般结果为double类型

举例：
select 5-3 from person;
select 5.2-3 from person;

乘法操作: *
语法: A * B
操作类型：所有数值类型
说明：返回A与B相乘的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。注意，如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型

举例：
select 5*3 from person;
select 5.2*3 from person;

除法操作: /
语法: A / B
操作类型：所有数值类型
说明：返回A除以B的结果。结果的数值类型为double

举例：
select 5/3 from person;
select 6.0/3 from person;
select 6/3 from person;

取余操作: %
语法: A % B
操作类型：所有数值类型
说明：返回A除以B的余数。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 41 % 5 from person;

位与操作: &
语法: A & B
操作类型：所有数值类型
说明：返回A和B按位进行与操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 4 & 8 from person;-----不会这个位与操作，没听过

位或操作: |
语法: A | B
操作类型：所有数值类型
说明：返回A和B按位进行或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
select 4 | 8 from person;-----不会这个位与操作，没听过,后期学习补

位异或操作: ^
语法: A ^ B
操作类型：所有数值类型
说明：返回A和B按位进行异或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。

举例：
 select 4 ^ 8 from person;-----不会这个位与操作，没听过,后期学习补

9．位取反操作: ~
语法: ~A
操作类型：所有数值类型
说明：返回A按位取反操作的结果。结果的数值类型等于A的类型。

举例： 
select ~6 ;
select 6 ;

三、逻辑运算：

逻辑与操作: AND 、&&
语法: A AND B
操作类型：boolean
说明：如果A和B均为TRUE，则为TRUE；否则为FALSE。如果A为NULL或B为NULL，则为NULL

举例：
select 1 from person where 1=1 and 2=2;
select 1 from person where 1=1 and 2<2;

逻辑或操作: OR
语法: A OR B
操作类型：boolean
说明：如果A为TRUE，或者B为TRUE，或者A和B均为TRUE，则为TRUE；否则为FALSE

举例：
select 1 from person  where 1=2 or 2<1;
select 1 from person  where 1=2 or 2>1;

逻辑非操作: NOT
语法: NOT A
操作类型：boolean
说明：如果A为FALSE，或者A为NULL，则为TRUE；否则为FALSE

举例：
select 1 from person  where not 1=2;

四、复杂的数据类型 array、map、struct

Hive中支持多种数据类型除了常用的TINYINT、SMALLINT、INT、BIGINT、BOOLEAN、FLOAT、DOUBLE、STRING、BINARY、TIMESTAMP、DECIMAL、DATE、VARCHAR、CHAR类型外，当然还包含一些复杂的数据类型（array、map、struct、union）。

1、数组array的用法
2.map的用法
3.struct的用法

参考文章:Hive复合数据类型array,map,struct的使用

1、数组array的用法

Array数组类型：由一系列相同数据类型的元素组成。

实例数据array.txt：姓名和工作地点

Huangbo beijing,shanghai,tianjin,Hangzhou
Xuzheng tianjin,chengdu,wuhan 
Wangbaoqiang    wuhan,shenyang,jilin

创建数据库表，该表中location的类型是数组类型

create table person(name string,location array<string>) row format delimited fields terminated by "\t" collection items terminated by ",";

数据加载到数据库

load data local inpath '/home/study/array.txt' into table person;

一些查询操作

select * from person_array;

//array类型访问: A[n]
//操作类型: A为array类型，n为int类型
//说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'
select name,location[0],size(location ) from person;

select name from person  where array_contains(location ,'beijing');

select location[3],location[4] from person;

2.map的用法

MAP：MAP包含key->value键值对，可以通过key来访问元素。比如”userlist”是一个map类型，其中username是key，password是value；那么我们可以通过userlist['username']来得到这个用户对应的password。

参考文章：Hive中复杂数据类型Map常用方法介绍

实例数据map.txt：姓名和学习成绩

huangbo yuwen:80,shuxue:89,yingyu:95
xuzheng yuwen:70,shuxue:65,yingyu:81
wangbaoqiang    yuwen:75,shuxue:100,yingyu:75

创建数据库表

create table score(name string, scores map<string,int>) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';

desc formatted score;

数据加载到数据库

load data local inpath '/home/study/map.txt' into table score;

一些查询操作

select * from score;

select name from score; 

select scores from score; 
// map类型访问: M[key]
//语法: M[key]
//操作类型: M为map类型，key为map中的key值

 size(Map)函数：

3.struct的用法

实例数据structtable.txt：学号、课程及得分

1   english,80
2   math,89
3   chinese,95

创建数据库表

create table structtable(id int,course struct<name:string,score:int>) row format delimited fields terminated by '\t' collection items terminated by ',';

数据加载到数据库

load data local inpath '/home/study/structtable.txt' into table structtable;

一些查询操作

select * from structtable;
select id from structtable;
select course from structtable;
select t.course.name from structtable t;
select t.course.score from structtable t;

五、复杂类型访问操作

1. array类型访问: A[n]
语法: A[n]
操作类型: A为array类型，n为int类型
说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'。

举例：
 select location[0],location[1],location[2] from person;

2. map类型访问: M[key]
语法: M[key]
操作类型: M为map类型，key为map中的key值
说明：返回map类型M中，key值为指定值的value值。比如，M是值为{'f' -> 'foo', 'b'-> 'bar', 'all' -> 'foobar'}的map类型，那么M['all']将会返回'foobar'

举例：
select s.scores['shuxue'] from score s;

3. struct类型访问: S.x
语法: S.x
操作类型: S为struct类型
说明：返回结构体S中的x字段。比如，对于结构体struct foobar {int foo, int bar}，foobar.foo返回结构体中的foo字段

举例：
select t.course.score from structtable t;

六、复杂类型长度统计函数

1. Map类型长度函数: size(Map<K.V>)
语法: size(Map<K.V>)
返回值: int
说明:返回map类型的长度

举例：
select size(map('100','tom','101','mary'));
select size(scores) from score ;

2. array类型长度函数: size(Array<T>)
语法: size(Array<T>)
返回值: int
说明:返回array类型的长度

举例：
select size(array('100','101','102','103'));
select size(location) from person;

3、struct不能使用size()统计类型的长度

七、复合类型构造函数 map struct array

Map类型构建: map
语法: map (key1, value1, key2, value2,…)
说明：根据输入的key和value对构建map类型

举例：
select map('100','tom','200','mary');
select map('yuwen',77,'shuxue',99);

Struct类型构建: struct
语法: struct(val1, val2, val3,…)
说明：根据输入的参数构建结构体struct类型

举例：
select struct('tom','mary','tim');

array类型构建: array
语法: array(val1, val2,…)
说明：根据输入的参数构建数组array类型

举例：
select array("tom","mary","tim");

八、类型转换函数

1. 二进制转换：binary
只有string、char、varchar或binary数据可以转换为二进制数据类型。

举例
select binary('3');

2. 基础类型之间强制转换：cast
CAST函数用于将某种数据类型的表达式显式转换为另一种数据类型。CAST()函数的参数是一个表达式，它包括用AS关键字分隔的源值和目标数据类型。
语法：CAST (expression AS data_type)

举例
select  cast(123 as string);
select cast(345 AS double);

九、日期函数

UNIX时间戳转日期函数: from_unixtime
语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明:转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式。时间戳是指格林bai威治时间1970年01月01日00时00分00秒(北京du时间1970年01月01日08时00分00秒)起至现在的总秒数。

举例：
SELECT from_unixtime(1602034999, 'yyyy-MM-dd');

获取当前UNIX时间戳函数: unix_timestamp
语法: unix_timestamp()
返回值: bigint
说明:获得当前时区的UNIX时间戳

举例：
SELECT UNIX_TIMESTAMP();

将当前时间转化为时间戳函数:unix_timestamp
语法: unix_timestamp(string date)
返回值: bigint
说明:转换格式为"yyyy-MM-ddHH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。

举例：
select  unix_timestamp('2015-09-07 02:46:43');  //将当前时间转化为时间戳格式

指定格式日期转UNIX时间戳函数:unix_timestamp
语法: unix_timestamp(string date, string pattern)
返回值: bigint
说明:转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。

举例：
select unix_timestamp('20111207 13:01:03','yyyyMMddHH:mm:ss');
select unix_timestamp('20111207','yyyyMMdd');

日期时间转日期函数: to_date
语法: to_date(string timestamp)
返回值: string
说明:返回日期时间字段中的日期部分。

举例：
select to_date('2018-12-08 10:03:01');--2018-12-08  返回日期时间字段中的日期部分

日期转年函数: year
语法: year(string date)
返回值: int
说明:返回日期中的年。

举例：
select year('2018-12-08 10:03:01');--2018 返回日期时间字段中的年
select year('2018-12-08');--2018 返回日期时间字段中的年

日期转月函数: month
语法: month (string date)
返回值: int
说明:返回日期中的月份。

举例：
select month('2018-12-08 10:03:01');--2018 返回日期时间字段中的月
select month('2018-12-08');--12

日期转天函数: day
语法: day (string date)
返回值: int
说明:返回日期中的天。

举例：
select day('2018-12-08 10:03:01');--8 返回日期时间字段中的日
select day('2018-12-08');--8

日期转小时函数: hour
语法: hour (string date)
返回值: int
说明:返回日期中的小时。

举例：
select hour('2018-12-08 10:03:01');--10返回日期时间字段中的小时

日期转分钟函数: minute
语法: minute (string date)
返回值: int
说明:返回日期中的分钟。

举例：
select minute('2018-12-08 10:03:01');-----3 返回日期中的分钟

日期转秒函数: second
语法: second (string date)
返回值: int
说明:返回日期中的秒。

举例：
select second('2018-12-08 10:03:01');-----1 返回日期中的秒

日期转周函数: weekofyear
语法: weekofyear (string date)
返回值: int
说明:返回日期在当前年的周数。

举例：
select weekofyear('2018-01-08 10:03:01');--返回本年的第几周

日期比较函数: datediff
语法: datediff(string enddate, string startdate)
返回值: int
说明:返回结束日期减去开始日期的天数。

举例：
select datediff('2019-07-02','2019-07-23'),datediff('2020-07-02','2019-07-23');
//求第一个时间于第二个时间相差的时间天数

日期增加函数: date_add
语法: date_add(string startdate, int days)
返回值: string
说明:返回开始日期startdate增加days天后的日期。

举例：
select date_add('2019-07-02', 22);//当前日期增加22天

日期减少函数: date_sub
语法: date_sub (string startdate, int days)
返回值: string
说明:返回开始日期startdate减少days天后的日期。

举例：
select date_sub('2019-07-12',10);//当前日期减少10天

16.获取当前时间：current_timestamp

select current_timestamp;//获取当前日期

十、数值计算函数

取整函数: round
语法: round(double a)
返回值: BIGINT
说明:返回double类型的整数值部分（遵循四舍五入）

举例：
select round(2.6);---3.0,四舍五入取整

指定精度取整函数: round
语法: round(double a, int d)
返回值: DOUBLE
说明:返回指定精度d的double类型

举例：
select round(1.23454,2);--1.23 四舍五入保留两位小数
select round(1213232,-2);--1213200

向下取整函数: floor ，往下取整
语法: floor(double a)
返回值: BIGINT
说明:返回等于或者小于该double变量的最大的整数

举例：
select  floor(1.3) ;-- 1
select  floor(1.99) ;-- 1
select  floor(-1.3) ;--    -2
select  floor(-1.99) ;--    -2

向上取整函数: ceil
语法: ceil(double a)
返回值: BIGINT
说明:返回等于或者大于该double变量的最小的整数

举例：
select  ceil(1.0)  ;--  1
select  ceil(1.0001) ;--  2
select  ceil(1.99) ;--  2 
select  ceil(1.29)  ;--  2 
select  ceil(-1.3)  ;--    -1

向上取整函数: ceiling
语法: ceiling(double a)
返回值: BIGINT
说明:与ceil功能相同

举例：
select  ceiling(1.0);--1
select  ceiling(1.0001);--2
select  ceiling(1.99);-- 2 
select  ceiling(1.29);-- 2 
select  ceiling(-1.3) ;--     -1

取随机数函数: rand
语法: rand(),rand(int seed)
返回值: double
说明:返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列

举例：
select rand();----返回值: double,返回一个0到1范围内的随机数
select rand(rand(int seed));
select rand(3);--------返回值: double,会等到一个稳定的随机数序列

自然指数函数: exp
语法: exp(double a)
返回值: double
说明:返回自然对数e的a次方

举例：
select exp(2);

忘记自然数指数了……

以10为底对数函数: log10
语法: log10(double a)
返回值: double
说明:返回以10为底的a的对数

举例：
select  log10(35);
select  log10(100);

忘记自然数以10为底对数了……

以2为底对数函数: log2
语法: log2(double a)
返回值: double
说明:返回以2为底的a的对数

举例：
select  log2(8);

对数函数: log
语法: log(double base, double a)
返回值: double
说明:返回以base为底的a的对数

举例：
select log(100);

幂运算函数: pow
语法: pow(double a, double p)
返回值: double
说明:返回a的p次幂

举例：select pow(2,3); ---计算2的3次幂

幂运算函数: power
语法: power(double a, double p)
返回值: double
说明:返回a的p次幂,与pow功能相同

举例：select power(2,4) ;

开平方函数: sqrt
语法: sqrt(double a)
返回值: double
说明:返回a的平方根

举例：select sqrt(16);----返回16的平方根

二进制函数: bin
语法: bin(BIGINT a)
返回值: string
说明:返回a的二进制代码表示

举例：select bin(8);

十六进制函数: hex
语法: hex(BIGINT a)
返回值: string
说明:如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示

举例：select hex(30);

反转十六进制函数: unhex
语法: unhex(string a)
返回值: string
说明:返回该十六进制字符串所代码的字符串

举例：
select unhex(616263);

我也不知道这个怎么运用

进制转换函数: conv
语法: conv(BIGINT num, int from_base, int to_base)
返回值: string
说明:将数值num从from_base进制转化到to_base进制

举例：
select conv(18,10,4);---将18从十进制转化成4进制

绝对值函数: abs
语法: abs(double a) abs(int a)
返回值: double int
说明:返回数值a的绝对值

举例：
select abs(-3.9);

正取余函数: pmod
语法: pmod(int a, int b),pmod(double a, double b)
返回值: int double
说明:返回正的a除以b的余数

举例：
select pmod(9,2);

正弦函数: sin
语法: sin(double a)
返回值: double
说明:返回a的正弦值

举例：
select sin(0);

反正弦函数: asin
语法: asin(double a)
返回值: double
说明:返回a的反正弦值

举例：
select asin(1);

余弦函数: cos
语法: cos(double a)
返回值: double
说明:返回a的余弦值

举例：
select cos(0);

反余弦函数: acos
语法: acos(double a)
返回值: double
说明:返回a的反余弦值

举例：
select acos(1);

positive函数: positive
语法: positive(int a), positive(double a)
返回值: int double
说明:返回a

举例：
select positive(10）;

negative函数: negative
语法: negative(int a), negative(double a)
返回值: int double
说明:返回-a

举例：
select negative(5);

十一、条件函数

If函数: if
语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull

举例：
select if(1=2,100,200)；
select if(1=1,100,200)；

非空查找函数: COALESCE
语法: COALESCE(T v1, T v2,…)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

举例：
select COALESCE(null,null,null) ;
select COALESCE(null,'100','50') ;

nvl函数：空值转换函数。只能传2个参数
若expr1为Null，则返回expr2，否则返回expr1。但是expr1和expr2的数据类型必须为相同类型。

select nvl('asc','asd'),nvl(null,'123'),nvl('123',null),nvl(null,null);

条件判断函数：CASE
语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

举例：
Select case 100 when 50 then 'tom' when 100 then 'mary'else 'tim' end;
Select case 200 when 50 then 'tom' when 100 then 'mary'else 'tim' end；

条件判断函数：CASE
语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e

举例：
select case when 1=2 then 'tom' when 2=2 then 'mary' else'tim' end;
select case when 1=1 then 'tom' when 2=2 then 'mary' else'tim' end;

十二、字符串函数

字符ascii码函数：ascii
语法: ascii(string str)
返回值: int
说明：返回字符串str第一个字符的ascii码

举例：
select ascii('abcde')；

base64字符串
字符串连接函数：concat
语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串

举例：
select concat('abc','def','gh');
select concat('abc','-','def','-','gh');

带分隔符字符串连接函数：concat_ws
语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符

举例：
select concat_ws('-','abc','def','gh') ;

数组转换成字符串的函数：concat_ws
小数位格式化成字符串函数：format_number
字符串截取函数：substr,substring
语法: substr(string A, int start),substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串

举例：
select substr('abcde',3) ;
select substring('abcde',3);

字符串截取函数：substr,substring
语法: substr(string A, int start, int len),substring(string A, intstart, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串

举例：
select substr('abcde',3,2);---cd
 select substring('abcde',3,2);
select substring('abcde',-2,2);--de

字符串查找函数：instr
字符串位置查找函数

举例：
select instr('abc','b');--2

字符串长度函数：length
语法: length(string A)
返回值: int
说明：返回字符串A的长度

举例：
select length('abcedfg')；---7

字符串查找函数：locate
字符串格式化函数：printf
字符串转换成map函数：str_to_map
base64解码函数：unbase64(string str)
字符串转大写函数：upper,ucase
语法: upper(string A) ucase(string A)
返回值: string
说明：返回字符串A的大写格式

举例：
select upper('abSEd');
select ucase('abSEd');

字符串转小写函数：lower,lcase
语法: lower(string A) lcase(string A)
返回值: string
说明：返回字符串A的小写格式

举例：
select lower('abSEd');
select lcase('abSEd');

去空格函数：trim
语法: trim(string A)
返回值: string
说明：去除字符串两边的空格

举例：
select trim(' abc ');

左边去空格函数：ltrim
语法: ltrim(string A)
返回值: string
说明：去除字符串左边的空格

举例：
select ltrim(' abc ');

右边去空格函数：rtrim
语法: rtrim(string A)
返回值: string
说明：去除字符串右边的空格

举例：
select rtrim(' abc ');

正则表达式替换函数：regexp_replace
语法: regexp_replace(string A, string B, string C)
返回值: string
说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。

举例：
select regexp_replace('foobar', 'oo|ar', '');---fb

正则表达式解析函数：regexp_extract
语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。

举例：
select regexp_extract('foothebar', 'foo(.*?)(bar)', 1);---the
//不会正则

URL解析函数：parse_url
语法: parse_url(string urlString, string partToExtract [, stringkeyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

举例：
select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST');---facebook.com

json解析函数：get_json_object
语法: get_json_object(string json_string, string path)
返回值: string
说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

空格字符串函数：space
语法: space(int n)
返回值: string
说明：返回长度为n的字符串

举例：
select space(10);
select length(space(10));---10

重复字符串函数：repeat
语法: repeat(string str, int n)
返回值: string
说明：返回重复n次后的str字符串

举例：
select repeat('abc',5);--abcabcabcabcabc

左补足函数：lpad
语法: lpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行左补足到len位

举例：
select lpad('abc',10,'td');---tdtdtdtabc
//注意：与GP，ORACLE不同，pad不能默认

右补足函数：rpad
语法: rpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行右补足到len位

举例：
select rpad('abc',10,'td');--abctdtdtdt

分割字符串函数: split
语法: split(string str, stringpat)
返回值: array
说明:按照pat字符串分割str，会返回分割后的字符串数组

举例：
select split('abtcdtef','t');--["ab","cd","ef"]

集合查找函数: find_in_set
语法: find_in_set(string str, string strList)
返回值: int
说明:返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0

举例：
select find_in_set('ab','ef,ab,de');
select find_in_set('at','ef,ab,de') ;

分词函数：sentences
将字符串中内容按语句分组，每个单词间以逗号分隔，最后返回数组。

举例：
select sentences('Hello there! How are you?');
select sentences('Hello there How are you?');

分词后统计一起出现频次最高的TOP-K
分词后统计与指定单词一起出现频次最高的TOP-K

十三、混合函数

调用Java函数：java_method
调用Java函数：reflect
字符串的hash值：hash
十四、XPath解析XML函数
参考文章：Hive常用函数 -- 混合函数和XPath 解析 XML 函数
xpath
xpath
语法: xpath(string xmlstr,string xpath_expression)
返回值: array
说明: 从 xml 字符串中返回匹配到表达式的结果数组。

select xpath('<a><b>b1</b><b>b2</b><c>c1</c></a>','a/b/text()');
---["b1","b2"]

xpath_string
语法: xpath_string(string xmlstr,string xpath_expression)
返回值: string
说明: 默认情况下，从 xml 字符串中返回第一个匹配到表达式的节点的值。

SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', '//b')；--b1

//指定返回匹配到哪一个节点
hive> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', '//b[2]');--b2

xpath_boolean
语法: xpath_boolean (string xmlstr,string xpath_expression)
返回值: boolean
说明: 返回 xml 字符串中是否匹配 xml 表达式

SELECT xpath_boolean ('<a><b>b</b></a>', 'a/b');--true

xpath_short, xpath_int, xpath_long
语法: xpath_short (string xmlstr,string xpath_expression)
xpath_int (string xmlstr,string xpath_expression)
xpath_long (string xmlstr,string xpath_expression)
返回值: int
说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。
xpath_float, xpath_double, xpath_number
语法: xpath_float (string xmlstr,string xpath_expression)
xpath_double (string xmlstr,string xpath_expression)
xpath_number (string xmlstr,string xpath_expression)
返回值: number
说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。

select xpath_double('<a><b>10.5</b><c>11.2</c></a>','sum(a/*)');
--21.7

十四、汇总统计函数（UDAF）

个数统计函数: count
语法: count(), count(expr), count(DISTINCT expr[, expr_.])
返回值: int
说明: count()统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCTexpr[, expr_.])返回指定字段的不同的非空值的个数
总和统计函数: sum
语法: sum(col), sum(DISTINCT col)
返回值: double
说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果
平均值统计函数: avg
最小值统计函数: min
最大值统计函数: max
非空集合总体变量函数: var_pop
求指定列数值的方差

select  var_pop(age) from student;

非空集合样本变量函数: var_samp
求指定列数值的样本方差

select  var_samp(age) from student;

总体标准偏离函数: stddev_pop
求指定列数值的标准偏差

select  STDDEV_POP(age) from student;

样本标准偏离函数: stddev_samp

select  stddev_samp(age) from student;

10．中位数函数: percentile

select  percentile(age) from student;

中位数函数: percentile
参考文章：hive 分位数函数 percentile(col, p)

select  percentile(age) from student;

近似中位数函数: percentile_approx

select  percentile_approx(age,0.95) from student;
---取得排位在倒数第5%的年龄。（使用时会对年龄进行排序,一般可以用于求中位数）

近似中位数函数: percentile_approx

select  percentile_approx(age,0.5) from student;

直方图: histogram_numeric
语法: histogram_numeric(col, b)
返回值: array<struct {‘x’,‘y’}>
说明:以b为基准计算col的直方图信息。

举例：
select histogram_numeric(100,5)

集合去重数：collect_set
collect_set

举例1：
select age,concat_ws('-',collect_set(department)) id,collect_set(department) id2,concat_ws('-',collect_set(cast(id as string))) from student group by age;

举例2：
//将age转化为字符串，cast(age as string)
select  concat_ws('-',collect_set(cast(age as string))),collect_set(cast(age as string)) from student;

集合不去重函数：collect_list

举例：
select age,concat_ws('-',collect_list(department)) id,concat_ws('-',collect_list(cast(id as string))) from student group by age;

十六、表格生成函数Table-Generating Functions (UDTF)

数组拆分成多行：explode
Map拆分成多行：explode

select  explode(scores)  from score;

十五、常用函数

1、Coalesce
非空查找函数: COALESCE
语法: COALESCE(T v1, T v2,…)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

2、Explode

select  explode(scores)  from score;

4、lateral view
lateral view用于和split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
参考文章：hive中的 lateral view
参考文章：hive函数之~hive当中的lateral view 与 explode

数据pageAds.txt

front_page  1,2,3
contact_page    3,4,5

建表

//一个简单的例子，假设我们有一张表pageAds，它有两列数据，第一列是pageid string，第二列是adid_list，即用逗号分隔的广告
create table pageAds(pageid string,adid_list array<int>) row format delimited fields terminated by "\t" collection items terminated by ",";

加载数据

load data local inpath '/home/study/pageAds.txt' into table pageAds;

要统计所有广告ID在所有页面中出现的次数。
首先分拆广告ID：

select  *  from  pageAds ;

SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

接下来就是一个聚合的统计：

SELECT adid, count(1) 
    FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;

3、grouping sets
参考文章：hive中grouping sets的使用
 参考文章：Hive SQL grouping sets 用法

grouping sets是一种将多个group by 逻辑写在一个sql语句中的便利写法。
GROUPING SETS：根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL
GROUPING__ID：表示结果属于哪一个分组集合，属于虚字段
CUBE：根据GROUP BY的维度的所有组合进行聚合。
ROLLUP：为CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合。

Hive常用函数总结

一、关系运算：

三、逻辑运算：

四、复杂的数据类型 array、map、struct

1、数组array的用法

2.map的用法

3.struct的用法

五、复杂类型访问操作

六、复杂类型长度统计函数

七、复合类型构造函数 map struct array

八、类型转换函数

九、日期函数

十、数值计算函数

十一、条件函数

十二、字符串函数

十三、混合函数

十四、汇总统计函数（UDAF）

十五、常用函数

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据爬虫Python AI Sql

大数据

大数据，机器学习，人工智能

SQL

Hive常用函数总结

一、关系运算：

三、逻辑运算：

四、复杂的数据类型 array、map、struct

1、数组array的用法

2.map的用法

3.struct的用法

五、复杂类型访问操作

六、复杂类型长度统计函数

七、复合类型构造函数 map struct array

八、类型转换函数

九、日期函数

十、数值计算函数

十一、条件函数

十二、字符串函数

十三、混合函数

十四、汇总统计函数（UDAF）

十五、常用函数

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据 爬虫Python AI Sql

大数据

大数据，机器学习，人工智能

SQL

大数据爬虫Python AI Sql