美文网首页
hive-入门

hive-入门

作者: 平头哥2 | 来源:发表于2019-06-13 11:23 被阅读0次

hive脚本执行:

[hadoop@slave03 bin]$ ./hive -e "select * from test.t_user";
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/app/hadoop-2.9.2/sbin:/usr/local/app/hadoop-2.9.2/bin:/usr/local/app/jdk1.8.0_211/bin)

Logging initialized using configuration in jar:file:/usr/local/app/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
OK
1   zhangsan    23  male
2   lisi    24  male
3   wangwu  25  male
4   zhaoliu 26  male
5   xiaoqi  27  female
6   xiaofeng    28  female
Time taken: 12.276 seconds, Fetched: 6 row(s)

hive建表语句:

create table t_student(
    id int,
    name string,
    score int
)
row format delimited
fields terminated by ',';

创建一个文件并上传到hdfs:

[hadoop@slave03 test]$ pwd
/home/hadoop/test
[hadoop@slave03 test]$ ls
1.txt  ss.sh
[hadoop@slave03 test]$ cat 1.txt 
1,aaa,23
2,bbb,34
3,ccc,45
[hadoop@slave03 test]$ hadoop fs -put 1.txt /user/hive/warehouse/test.db/t_student
19/06/11 13:56:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

查看表中的数据:

0: jdbc:hive2://slave01:10000> select * from t_student;
+---------------+-----------------+------------------+
| t_student.id  | t_student.name  | t_student.score  |
+---------------+-----------------+------------------+
| 1             | aaa             | 23               |
| 2             | bbb             | 34               |
| 3             | ccc             | 45               |
+---------------+-----------------+------------------+
3 rows selected (0.371 seconds)

以上创建的表都是内部表,接下来看看hive中外部表的创建:

create external table t_teacher(
    id int,
    name string,
    course string,
    office string
)
row format delimited
fields terminated by ","
location '/pvlog/2019-06-11';

外部表和内部表的区别:

1. 内部表数据由Hive自身管理,外部表数据由HDFS管理; 
2. 内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定; 
3. 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除; 
4. 对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)

创建分区表:

create table t_pv_log(ip string, url string, access_time string)
partitioned by (day string)
row format delimited
fields terminated by ',';

查看表结构信息:

0: jdbc:hive2://slave01:10000> desc t_pv_log;
+--------------------------+-----------------------+-----------------------+
|         col_name         |       data_type       |        comment        |
+--------------------------+-----------------------+-----------------------+
| ip                       | string                |                       |
| url                      | string                |                       |
| access_time              | string                |                       |
| day                      | string                |                       |
|                          | NULL                  | NULL                  |
| # Partition Information  | NULL                  | NULL                  |
| # col_name               | data_type             | comment               |
|                          | NULL                  | NULL                  |
| day                      | string                |                       |
+--------------------------+-----------------------+-----------------------+
9 rows selected (0.463 seconds)

准备数据:

pv_log.01

192.168.33.2,http://sina.com/a, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/b, 2019-06-10 12:30:30
192.168.33.3,http://sina.com/b, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-10 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-10 12:30:30
192.168.33.3,http://sina.com/a, 2019-06-10 12:30:30
192.168.33.2,http://sina.com/a, 2019-06-10 12:30:30

pv_log.02

192.168.33.2,http://sina.com/a, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/b, 2019-06-11 12:30:30
192.168.33.3,http://sina.com/b, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-11 12:30:30
192.168.33.4,http://sina.com/c, 2019-06-11 12:30:30
192.168.33.3,http://sina.com/a, 2019-06-11 12:30:30
192.168.33.2,http://sina.com/a, 2019-06-11 12:30:30

从本地导入数据到分区表:

0: jdbc:hive2://slave01:10000> load data local inpath '/home/hadoop/test/pv_log.01' into table t_pv_log partition(day='2019-06-10');

查询结果(多了一个分区字段)

0: jdbc:hive2://slave01:10000> select * from t_pv_log;
+---------------+--------------------+-----------------------+---------------+
|  t_pv_log.ip  |    t_pv_log.url    | t_pv_log.access_time  | t_pv_log.day  |
+---------------+--------------------+-----------------------+---------------+
| 192.168.33.2  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/b  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.3  | http://sina.com/b  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/c  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/c  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.3  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.2  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
+---------------+--------------------+-----------------------+---------------+
7 rows selected (2.115 seconds)

查看hadoop的目录结构:

[hadoop@slave03 ~]$ hadoop fs -ls /user/hive/warehouse/t_pv_log
19/06/13 11:14:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2019-06-13 11:07 /user/hive/warehouse/t_pv_log/day=2019-06-10
drwxr-xr-x   - hadoop supergroup          0 2019-06-13 11:09 /user/hive/warehouse/t_pv_log/day=2019-06-11

分区查询:

0: jdbc:hive2://slave01:10000> select * from t_pv_log where day = '2019-06-10';
+---------------+--------------------+-----------------------+---------------+
|  t_pv_log.ip  |    t_pv_log.url    | t_pv_log.access_time  | t_pv_log.day  |
+---------------+--------------------+-----------------------+---------------+
| 192.168.33.2  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/b  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.3  | http://sina.com/b  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/c  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.4  | http://sina.com/c  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.3  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
| 192.168.33.2  | http://sina.com/a  |  2019-06-10 12:30:30  | 2019-06-10    |
+---------------+--------------------+-----------------------+---------------+
7 rows selected (0.44 seconds)

相关文章

  • 数仓--Hive--面试题准备

    数仓--Hive-面试之Hive与HBase的区别数仓--Hive-面试之Hive架构原理数仓--Hive-面试之...

  • hive-入门

    hive脚本执行: hive建表语句: 创建一个文件并上传到hdfs: 查看表中的数据: 以上创建的表都是内部表,...

  • hsql优化

    HSQL 转载:https://www.fangxuan.win/2018/08/09/hive-%E6%80%A...

  • Apache Hive - 通用调优

    Apache Hive-通用优化-featch抓取机制 mr本地模式 Fetch抓取机制 功能:在执行sql的时...

  • hive-函数

    1、建表导入json数据 建表:create table rating_json(json string); 导入...

  • Hive-索引

    简介 Hive从0.7.0版本开始加入了索引,目的是提高Hive表指定列的查询速度。没有索引的时候,Hive在执行...

  • Hive-索引

    Hive 索引 由Lefty Leverenz创建,最后修改于六月17,2018 从3.0开始删除索引 Hive索...

  • HIVE- 数据倾斜

    数据倾斜就是由于数据分布不均匀,数据大量集中到一点上,造成数据热点。大多数情况下,分为一下三种情况: 1.map端...

  • Hive-函数汇总

    [toc] 关系运算 1、等值比较: = 语法:A=B操作类型:所有基本类型描述: 如果表达式A与表达式B相等,则...

  • Hive-初识HIVE(一)

    转载自:https://www.cnblogs.com/qingyunzong/p/8707885.html Hi...

网友评论

      本文标题:hive-入门

      本文链接:https://www.haomeiwen.com/subject/ptmjfctx.html