Hive2

作者: 扣篮的左手 | 来源:发表于2018-06-05 15:09 被阅读0次

初体验

在hive中创建一个表，按照逗号分隔：

create table aa_test(id int, name string, age int) row format delimited fields terminated by ',';

在node-1创建一个格式化文件1.txt

将1.txt上传至HDFS，目录为Hive当中创建的表对应的目录下：

/user/hive/warehouse/aa.db/aa_test

执行select count(*) from aa_test，其实是跑了一个MapReduce程序。Hive就是把SQL语句转化成了MapReduce作业。

建表的数据类型

在Hive建表的数据类型：
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
支持java的类型大小写不敏感
除了基本类型外还支持复杂数据类型例如Arrary Map等
关键在于分隔符的指定

创建两个文本文件，aa.txt当中存放的是英文名，bb.txt中的是中文名：

在Hive当中创建表

create table t_user(id int, name string) row format delimited fields terminated by ',';

这个时候在HDFS上/user/hive/warehouse路径下已经有了t_user文件夹，让后将前边的aa.txt 和 bb.txt两个文件放到t_user目录下。

hadoop fs -put aa.txt bb.txt /user/hive/warehouse/t_user

上传之后目录下就会有两个文件：

假设t_user还有一个字段就是用来存储国家的，这个时候想要查询中国人或者外国人需要进行全表的扫描，然后用where进行过滤，这样会消耗大量的时间。如果这个时候需要查询中国人的话，只去扫描bb.txt，就会非常快。这就提出了分区表。

单分区建表语句：

create table t_user2(id int, name string) partitioned by (country string) row format delimited fields terminated by ',';

分区表导入数据的语法：

本文标题：Hive2

本文链接：https://www.haomeiwen.com/subject/wqvpsftx.html