初体验
在hive中创建一个表,按照逗号分隔:
create table aa_test(id int, name string, age int) row format delimited fields terminated by ',';
data:image/s3,"s3://crabby-images/d89fb/d89fb1e9d0573c1fa6dc344d5d94c14de1016564" alt=""
data:image/s3,"s3://crabby-images/9b127/9b1276fc60222c9fa73f8473158edf4c6d3bffe8" alt=""
在node-1创建一个格式化文件1.txt
data:image/s3,"s3://crabby-images/d0d22/d0d22d194770f9a6048973addb86235ded8a59ee" alt=""
将1.txt上传至HDFS,目录为Hive当中创建的表对应的目录下:
/user/hive/warehouse/aa.db/aa_test
data:image/s3,"s3://crabby-images/54a4a/54a4a12d2eb7586978f953741322f839af9b04a2" alt=""
data:image/s3,"s3://crabby-images/26749/267495b1f84b790554ac166132721d8f7b7df82b" alt=""
data:image/s3,"s3://crabby-images/9cd80/9cd8057ce08e1c9cebe932e1e785fe25b75e814b" alt=""
执行select count(*) from aa_test,其实是跑了一个MapReduce程序。Hive就是把SQL语句转化成了MapReduce作业。
data:image/s3,"s3://crabby-images/18719/187195f3561c96462554e0cc41512ef36a366d69" alt=""
建表的数据类型
在Hive建表的数据类型:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
支持java的类型 大小写不敏感
除了基本类型外 还支持复杂数据类型 例如Arrary Map等
关键在于分隔符的指定
分区表
创建两个文本文件,aa.txt当中存放的是英文名,bb.txt中的是中文名:
data:image/s3,"s3://crabby-images/bac91/bac91eb720542823c61c8eb53f497ac74df15832" alt=""
data:image/s3,"s3://crabby-images/3f9dd/3f9dd2639526ed2e4a30fae954cc2430ef88ff40" alt=""
在Hive当中创建表
create table t_user(id int, name string) row format delimited fields terminated by ',';
data:image/s3,"s3://crabby-images/3a6ed/3a6ed56219eed5e16e25872d07c73e9a173857a0" alt=""
data:image/s3,"s3://crabby-images/b12e0/b12e0a6697574092697442ae733435f277cbddb4" alt=""
这个时候在HDFS上/user/hive/warehouse路径下已经有了t_user文件夹,让后将前边的aa.txt 和 bb.txt两个文件放到t_user目录下。
hadoop fs -put aa.txt bb.txt /user/hive/warehouse/t_user
上传之后目录下就会有两个文件:
data:image/s3,"s3://crabby-images/29e36/29e3689c72cdc57f5dc43f24ce1829aa12b4d2fd" alt=""
data:image/s3,"s3://crabby-images/664ae/664ae9dac272f3a5e4562b2cd140833e5cf65d7a" alt=""
假设t_user还有一个字段就是用来存储国家的,这个时候想要查询中国人或者外国人需要进行全表的扫描,然后用where进行过滤,这样会消耗大量的时间。如果这个时候需要查询中国人的话,只去扫描bb.txt,就会非常快。这就提出了分区表。
单分区建表语句:
create table t_user2(id int, name string) partitioned by (country string) row format delimited fields terminated by ',';
data:image/s3,"s3://crabby-images/95394/95394406619311501710f67f6a01a8f33b88fc66" alt=""
分区表导入数据的语法:
网友评论