Hive SQL基本操作（持续更新中......）

作者: 大数据ZRL | 来源:发表于2020-01-20 17:52 被阅读0次

Hive SQL基本操作（持续更新中......）
[译]Hive学习指南（二）
Hive sql常见操作
2.HiveQL：数据定义
记录一次 hadoop+tornado 简单实践（二） -- h
Hive数据源实战
Hive Sql case when 不支持子查询
spark sql use hive
SQL常用操作(持续更新)
原创-大数据平台权限设计分享-spark sql权限

Hive表重命名

ALTER TABLE oldName RENAME TO newName;

查看Hive表简表语句

show create table tableName

查看表字段详情

desc tableName

插入多个值

insert into table dbName.tableName values ('bj','1234','哈哈'), ('ah','1245','菜鸡')

添加/删除分区

添加分区：
- alter table tableName add partition (dt='2019-02-12', du='0')
删除分区：
- alter table ods.ods_vas_user_order_increment drop partition (prov_alias='bj', dt='20200505')

添加字段

alter table dwd.dwd_vas_user_order_increment_day add columns (operation string comment 'add/update/delete', operation_dt string)

修改字段

ALTER TABLE dept.demo CHANGE name ename string comment '测试说明';

取多个字段中第几大的数

取五个字段中最小的值（index=0）
- sort_array(array(filed4,filed3,filed1,filed0,filed2))[0]
取10个数字中最大的值（index=9）
- sort_array(array(10,1,8,3,6,5,4,7,2,9))[9]

通过hdfs dfs -put或者hdfs api写入hive分区表的数据在hive中无法被查询到的问题

用法：MSCK REPAIR TABLE tableName
原理：将关于分区的元信息添加到Hive metastore中
优点：避免频繁使用alter table add partition

往表分区插入数据时的注意事项

insert overwrite table tableName1 partition (c='ah') select a,b,c from tableName2;
覆盖插入数据到指定表的指定分区(c='ah')时，插入数据不能再包括分区字段c，因为指定分区(c='ah')时已经就确定了分区字段的值'ah'。所以上述错误，应该把select后面的c删了。

日期减n天（Hive函数）

date_sub('2020-01-02'，n)
注意：这里的格式必须为'yyyy-mm-dd'

使用join on 时的注意是事项

记住on后面一定使用‘=’，不要使用‘<>’
等号‘=’前后的字段，为NULL的都会关联不上，一定注意
on后面的关联字段，如果存在重复值，则结果数据会翻倍

本地数据导入Hive

从服务器本地加载文件进hive（csv2hive.csv中只能包含数据，不能包含字段名）
- load data local inpath 'csv2hive.csv' overwrite into table tableName
从服务器的分布式环境加载数据进hive（csv2hive.csv中只能包含数据，不能包含字段名）
- load data inpath 'csv2hive.csv' overwrite into table tableName

往hdfs导入文件

hadoop fs -put < local/hdfs src > < hdfs dest >
hadoop fs -copyFromLocal < local src > < hdfs dest >

从hdfs导出文件

hadoop fs -get < hdfs src > < local/hdfs dest >
hadoop fs -copyToLocal < hdfs src > < local dest >

grouping sets与grouping__id

案例如下：

select id,
       city,
       type,
       sum(income),
       grouping__id
from db.table
group by id,city,type
grouping_set(id,city,type,(id,city,type))

上述代码是分别按照id、city、type、（id,city,type）进行group by，等价于下面代码

select id,  null,null,sum(income),1 as grouping__id from db.table group by id
union all
select null,city,null,sum(income),2 as grouping__id from db.table group by city
union all
select null,null,type,sum(income),4 as grouping__id from db.table group by type
union all
select id,  city,type,sum(income),7 as grouping__id from db.table group by id,city,type

grouping__id的值怎么来的？
- 注意案例中的'group by id,city,type'的顺序

	id	city	type		grouping__id
group by id	1	0	0	1×2^0 + 0×2^1 + 0×2^2	1
group by city	0	1	0	0×2^0 + 1×2^1 + 0×2^2	2
group by type	0	0	1	0×2^0 + 0×2^1 + 1×2^2	4
group by id,city,type	1	1	1	1×2^0 + 1×2^1 + 1×2^2	7