hive脚本执行:
[hadoop@slave03 bin]$ ./hive -e "select * from test.t_user";
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/app/hadoop-2.9.2/sbin:/usr/local/app/hadoop-2.9.2/bin:/usr/local/app/jdk1.8.0_211/bin)
Logging initialized using configuration in jar:file:/usr/local/app/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
OK
1 zhangsan 23 male
2 lisi 24 male
3 wangwu 25 male
4 zhaoliu 26 male
5 xiaoqi 27 female
6 xiaofeng 28 female
Time taken: 12.276 seconds, Fetched: 6 row(s)
hive建表语句:
create table t_student(
id int,
name string,
score int
)
row format delimited
fields terminated by ',';
创建一个文件并上传到hdfs:
[hadoop@slave03 test]$ pwd
/home/hadoop/test
[hadoop@slave03 test]$ ls
1.txt ss.sh
[hadoop@slave03 test]$ cat 1.txt
1,aaa,23
2,bbb,34
3,ccc,45
[hadoop@slave03 test]$ hadoop fs -put 1.txt /user/hive/warehouse/test.db/t_student
19/06/11 13:56:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看表中的数据:
0: jdbc:hive2://slave01:10000> select * from t_student;
+---------------+-----------------+------------------+
| t_student.id | t_student.name | t_student.score |
+---------------+-----------------+------------------+
| 1 | aaa | 23 |
| 2 | bbb | 34 |
| 3 | ccc | 45 |
+---------------+-----------------+------------------+
3 rows selected (0.371 seconds)
以上创建的表都是内部表,接下来看看hive中外部表的创建:
create external table t_teacher(
id int,
name string,
course string,
office string
)
row format delimited
fields terminated by ","
location '/pvlog/2019-06-11';
外部表和内部表的区别:
1. 内部表数据由Hive自身管理,外部表数据由HDFS管理;
2. 内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定;
3. 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
4. 对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)
创建分区表:
create table t_pv_log(ip string, url string, access_time string)
partitioned by (day string)
row format delimited
fields terminated by ',';
查看表结构信息:
0: jdbc:hive2://slave01:10000> desc t_pv_log;
+--------------------------+-----------------------+-----------------------+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+
| ip | string | |
| url | string | |
| access_time | string | |
| day | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| day | string | |
+--------------------------+-----------------------+-----------------------+
9 rows selected (0.463 seconds)
准备数据:
pv_log.01
192.168.33.2,http://sina.com/a, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/b, 2019-06-10 12:30:30
192.168.33.3,http://sina.com/b, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-10 12:30:30
192.168.33.3,http://sina.com/a, 2019-06-10 12:30:30
192.168.33.2,http://sina.com/a, 2019-06-10 12:30:30
pv_log.02
192.168.33.2,http://sina.com/a, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/b, 2019-06-11 12:30:30
192.168.33.3,http://sina.com/b, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-11 12:30:30
192.168.33.3,http://sina.com/a, 2019-06-11 12:30:30
192.168.33.2,http://sina.com/a, 2019-06-11 12:30:30
从本地导入数据到分区表:
0: jdbc:hive2://slave01:10000> load data local inpath '/home/hadoop/test/pv_log.01' into table t_pv_log partition(day='2019-06-10');
查询结果(多了一个分区字段):
0: jdbc:hive2://slave01:10000> select * from t_pv_log;
+---------------+--------------------+-----------------------+---------------+
| t_pv_log.ip | t_pv_log.url | t_pv_log.access_time | t_pv_log.day |
+---------------+--------------------+-----------------------+---------------+
| 192.168.33.2 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/b | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.3 | http://sina.com/b | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/c | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/c | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.3 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.2 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
+---------------+--------------------+-----------------------+---------------+
7 rows selected (2.115 seconds)
查看hadoop的目录结构:
[hadoop@slave03 ~]$ hadoop fs -ls /user/hive/warehouse/t_pv_log
19/06/13 11:14:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-06-13 11:07 /user/hive/warehouse/t_pv_log/day=2019-06-10
drwxr-xr-x - hadoop supergroup 0 2019-06-13 11:09 /user/hive/warehouse/t_pv_log/day=2019-06-11
分区查询:
0: jdbc:hive2://slave01:10000> select * from t_pv_log where day = '2019-06-10';
+---------------+--------------------+-----------------------+---------------+
| t_pv_log.ip | t_pv_log.url | t_pv_log.access_time | t_pv_log.day |
+---------------+--------------------+-----------------------+---------------+
| 192.168.33.2 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/b | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.3 | http://sina.com/b | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/c | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.4 | http://sina.com/c | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.3 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
| 192.168.33.2 | http://sina.com/a | 2019-06-10 12:30:30 | 2019-06-10 |
+---------------+--------------------+-----------------------+---------------+
7 rows selected (0.44 seconds)
网友评论