hive 简介

作者: 哈斯勒 | 来源:发表于2019-06-11 16:20 被阅读0次

大数据知识 | hive初识
Hive on Spark配置
hive基础入门与环境的搭建
Hive简介
Hive简介
hive 简介
Hive简介
搭建Hive3.x并整合MySQL8.x存储元数据
Hive开发--Hive简介
[从零开始学Hive]Hive入门

Hive是由Facebook开源用于解决海量结构化日志的数据统计。
Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张表，并提供类SQL查询功能。其本质是：将HQL转化成MapReduce程序
1）Hive处理的数据存储在HDFS
2）Hive分析数据底层的实现是MapReduce
3）执行程序运行在Yarn上

image.png

HIve架构：

image.png

1．用户接口：Client
CLI（hive shell）、JDBC/ODBC(java访问hive)、WEBUI（浏览器访问hive）
2．元数据：Metastore
元数据包括：表名、表所属的数据库（默认是default）、表的拥有者、列/分区字段、表的类型（是否是外部表）、表的数据所在目录等；
默认存储在自带的derby数据库中，推荐使用MySQL存储Metastore
3．Hadoop
使用HDFS进行存储，使用MapReduce进行计算。
4．驱动器：Driver
（1）解析器（SQL Parser）：将SQL字符串转换成抽象语法树AST，这一步一般都用第三方工具库完成，比如antlr；对AST进行语法分析，比如表是否存在、字段是否存在、SQL语义是否有误。
（2）编译器（Physical Plan）：将AST编译生成逻辑执行计划。
（3）优化器（Query Optimizer）：对逻辑执行计划进行优化。
（4）执行器（Execution）：把逻辑执行计划转换成可以运行的物理计划。对于Hive来说，就是MR/Spark。

Hive运行机制：

image.png

Hive通过给用户提供的一系列交互接口，接收到用户的指令(SQL)，使用自己的Driver，结合元数据(MetaStore)，将这些指令翻译成MapReduce，提交到Hadoop中执行，最后，将执行返回的结果输出到用户交互接口。

（1）启动hive
[root@big01 hadoop-3.2.0]# hive
（2）查看数据库
hive> show databases;
（3）打开默认数据库
hive> use default;
（4）显示default数据库中的表
hive> show tables;
（5）创建一张表
hive> create table student(id int, name string);
（6）显示数据库中有几张表
hive> show tables;
（7）查看表的结构
hive> desc student;
（8）向表中插入数据
hive> insert into student values(1000,"ss");
（9）查询表中数据
hive> select * from student;
（10）退出hive
hive> quit;

（1）启动hive
[root@big01 hadoop-3.2.0]# hive
（2）显示数据库
hive> show databases;
（3）使用default数据库
hive> use default;
（4）显示default数据库中的表
hive> show tables;
（5）删除已创建的student表
hive> drop table student;
（6）创建student表, 并声明文件分隔符’\t’
hive> create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED
 BY '\t';
（7）加载/opt/module/datas/student.txt 文件到student数据库表中。
hive> load data local inpath '/opt/module/datas/student.txt' into table student;
（8）Hive查询结果
hive> select * from student;
OK
1001    zhangshan
1002    lishi
1003    zhaoliu
Time taken: 0.266 seconds, Fetched: 3 row(s)

hive常用交互命令
1．“-e”不进入hive的交互窗口执行sql语句
[root@big01 opt]# hive -e 'select * from student;'

2．“-f”执行脚本中sql语句
在/opt/module/datas目录下创建hivef.sql文件

[root@big01 opt]$ touch hivef.sql
文件中写入正确的sql语句
select *from student;

执行文件中的sql语句
[root@big01 opt]$ bin/hive -f /opt/module/datas/hivef.sql

执行文件中的sql语句并将结果写入文件中
 [root@big01 opt]$ bin/hive -f /opt/module/datas/hivef.sql  > /opt/module/datas/hive_result.txt

其他命令操作：
1．在hive cli命令窗口中如何查看hdfs文件系统
hive> dfs -ls /;
2．在hive cli命令窗口中如何查看本地文件系统
hive> ! ls /opt/module/datas;
3．查看在hive中输入的所有历史命令
    进入到当前用户的根目录/root或/home/atguigu，查看. hivehistory文件
 [root@big01 ~]# cat .hivehistory

image.png

显式类型转换：
CAST('1' AS INT)将把字符串'1' 转换成整数1；如果强制类型转换失败，如执行CAST('X' AS INT)，表达式返回空值 NULL

案例：
假设某表有如下一行，我们用JSON格式来表示其数据结构。在Hive下访问的格式为
{
    "name": "songsong",
    "friends": ["bingbing" , "lili"] ,       //列表Array, 
    "children": {                      //键值Map,
        "xiao song": 18 ,
        "xiaoxiao song": 19
    }
    "address": {                      //结构Struct,
        "street": "hui long guan" ,
        "city": "beijing" 
    }
}

步骤一：
创建本地测试文件test2.txt
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

步骤二：
Hive上创建测试表test
create table test2(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

步骤三：
导入文本数据到测试表
hive> load data local inpath '/tmp/root/test2.txt'into table test2

步骤四：
访问三种集合列里的数据，以下分别是ARRAY，MAP，STRUCT的访问方式
hive> select friends[1],children['xiao song'],address.city from test2 where name="songsong";
OK
lili    18      beijing
Time taken: 0.659 seconds, Fetched: 1 row(s)