好程序员大数据学习路线分享hive的数据类型

作者: ab6973df9221 | 来源:发表于2019-08-06 16:03 被阅读0次

好程序员大数据学习路线之hive表的查询
好程序员大数据学习路线之hive存储格式
好程序员大数据学习路线hive内部函数
好程序员大数据学习路线分享hive的数据类型
大数据、python学习微信没有途径？
大数据学习困难？我给你分享一个学习思维中
HIVE快速入门教程4Hive数据类型和创建，删除数据库
[从零开始学Hive]Hive数据类型&DDL&DML
hive数据类型
好程序员大数据培训教程分享hive分区和分桶

　　好程序员大数据培训分享hive的数据类型：1.基本数据类型

类型描述示例

TINYINT1字节有符号整数1

SMALLINT2字节有符号整数1

INT4字节有符号整数1

BIGINT8字节有符号整数1

FLOAT4字节单精度浮点数1.0

DOUBLE8字节双精度浮点数1.0

BOOLEANtrue/falseTRUE

STRING字符串‘a’,”a”

BINARY字节数组

TIMESTAMP精度到纳秒的时间戳132550245000，‘2016-01-01 03:04:05.123456789'

　　新增数据类型TIMESTAMP的值可以是：

• 　　整数：距离Unix新纪元时间（1970年1月1日，午夜12点）的秒数

• 　　浮点数：距离Unix新纪元时间的秒数，精确到纳秒（小数点后保留9位数）

• 　　字符串：JDBC所约定的时间字符串格式，格式为：YYYY-MM-DD hh:mm:ss:fffffffff

　　BINARY数据类型用于存储变长的二进制数据。

　　2.复杂数据类型

类型描述示例

ARRAY一组有序字段，字段的类型必须相同array(1,2)

MAP一组无需的键值对，键的类型必须是原子的，值可以是任何类型。同一个映射的键的类型必须相同，值的类型也必须相同。map(‘a’,1,’b’,2)

STRUCT一组命名的字段，字段的类型可以不同struct(‘a’,1,1,0)

　　3.数据类型应用举例

　　##创建员工表，使用默认分割符CREATE TABLE employee( name STRING, salary FLOAT, leader ARRAY<STRING>, deductions MAP<STRING,FLOAT>, address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT> ) ;

　　4.列的分割符

　　HiveQL文本文件数据编码表

类型描述

\n对于文本文件来说，每行都是一条记录，因此换行符可以分割记录

^A(Ctrl+A)用于分隔字段（列）。在CREATE TABLE语句中可以使用八进制编码\001表示

^B用于分隔ARRARY或者STRUCT中的元素，或用于MAP中键-值对之间的分隔。在CREATE TABLE语句中可以使用八进制编码\002表示

^C用于MAP中键和值之间的分隔。在CREATE TABLE语句中可以使用八进制编码\003表示

　　CREATE TABLE employee( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING,FLOAT>, address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT> )ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

• 　　[ROW FORMAT DELIMITED]关键字，是用来设置创建的表在加载数据的时候，支持的列分隔符；

• 　　FIELDS TERMINATED BY '\001'，字符\001是^A的八进制数。这个子句表明Hive将使用^A字符作为列分隔符。

• 　　COLLECTION ITEMS TERMINATED BY '\002'，字符\002是^B的八进制数。这个子句表明Hive将使用^B字符作为集合元素的分隔符。

• 　　MAP KEYS TERMINATED BY '\003'，字符\003是^C的八进制数。这个子句表明Hive将使用^C字符作为map的键和值之间的分隔符。

• 　　LINES TERMINATED BY '\n' 、STORED AS TEXTFILE这个两个子句不需要ROW FORMAT DELIMITED 关键字

• 　　Hive目前对于LINES TERMINATED BY…仅支持字符‘\n’，行与行之间的分隔符只能为‘\n’。

　　hive的基本命令

　　1.数据库的创建：

　　本质上是在hdfs上创建一个目录，使用comment加入数据库的描述信息，描述信息放在引号里。数据库的属性信息放在描述信息之后用with dbproperties 加入，属性信息放在括号内，属性名和属性值放在引号里，用等号连接有多条属性用逗号分隔

　　##创建一个数据库名为myhive,加入描述信息及属性信息create database myhive comment 'this is myhive db'with dbproperties ('author'='me','date'='2018-4-21');##查看属性信息describe database extended myhive;##在原有数据库基础上加入新的属性信息alter database myhive set dbproperties ('id'='1');##切换库use myhive;##删除数据库drop database myhive;

　　2.表的创建

　　默认创建到当前数据库(default是hive默认库)，创建表的本质也是在hdfs上创建一个目录

　　==================练习array的使用，本地数据加载，对比hive与mysql的区别========================##创建数据array.txt映射表t_arraycreate table if not exists t_array(id int comment 'this is id',score array<tinyint>)comment 'this is my table'row format delimited fields terminated by ','collection items terminated by '|'tblproperties ('id'='11','author'='me');##从本地加载数据array.txt文件load data local inpath '/testdata/array.txt' into table t_array;##查询表里面的数据select * from t_array;##查询id=1的第一条成绩信息select score[0] from t_array where id=1;##查询id=2的成绩条数select size(score) from t_array where id=2;##查询一共有多少条数据select count(*) from t_array;##把arra1.txt追加的方式从本地加载进这个表中load data local inpath '/testdata/array1.txt' into table t_array;##把test.txt追加的方式从本地加载进这个表中load data local inpath '/testdata/test.txt' into table t_array;##从本地覆盖方式加载数据array.txt文件至t_array表中load data local inpath '/testdata/array.txt' overwrite into table t_array;

　　====================练习map的使用，查看表的创建过程，创建表的同时指定数据位置===================##创建数据map.txt的映射表t_mapcreate table if not exists t_map(id int,score map<string,int>)row format delimited fields terminated by ','collection items terminated by '|'map keys terminated by ':'stored as textfile;##从hdfs加载数据，map.txt在hdfs上的位置位置被移动。load data local inpath '/testdata/map.txt' into table t_map;##查询id=1的数学成绩select score['math'] from t_map where id=1;##查询每个人考了多少科select size(score) from t_map;##查看表的创建过程show create table t_map;CREATE TABLE `t_map1`(`id` int, `score` map<string,int>)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION 'hdfs://linux5:8020/user/hive/warehouse/t_map';##创建表的同时指定数据的位置create table if not exists t_map2(id int,score map<string,int>)row format delimited fields terminated by ','collection items terminated by '|'map keys terminated by ':'stored as textfilelocation '/test';##删除表drop table test2;

　　====================练习struct的使用，外部表的创建，总结内部表外部表的区别=====================##创建数据struct.txt的映射表t_struct(使用external关键字并指定数据位置创建外部表)create external table if not exists t_struct(id int,grade struct<score:int,desc:string,point:string>)row format delimited fields terminated by ','collection items terminated by '|'location '/external'##查看score>90的信息select * from t_struct where grade.score>90;##创建外部表t_struct1create external table if not exists t_struct1(id int,grade struct<score:int,desc:string,point:string>)row format delimited fields terminated by ','collection items terminated by '|';##insert into 方式追加数据insert into table t_struct1 select * from t_struct;##删除表：只有元数据被删除，数据文件仍然存储在hdfs上drop table t_struct;

　　3.为hive表加载数据：

　　将数据文件copy到对应的表目录下面(如果是hdfs上的目录，将是剪切)。

　　##load方式从本地加载数据，会将数据拷贝到表所对应的hdfs目录#追加load data local inpath '本地数据路径' into table tablename#覆盖load data local inpath '本地数据路径' overwrite into table tablename

　　##load方式从hdfs加载数据,会将数据移动到对应的hdfs目录#追加load data inpath 'hdfs数据路径' into table tablename#覆盖load data inpath 'hdfs数据路径' into table tablename

　　##通过查询语句向表中插入数据#追加insert into table table1 select * from table2#覆盖insert overwrite into table table1 select * from table2

　　4.内部表与外部表

　　内部表：在Hive 中创建表时，默认情况下Hive 负责管理数据。即，Hive 把数据移入它的"仓库目录" (warehouse directory)

　　外部表：由用户来控制数据的创建和删除。外部数据的位置需要在创建表的时候指明。使用EXTERNAL关键字以后， Hìve 知道数据并不由自己管理，因此不会把数据移到自己的仓库目录。事实上，在定义时，它甚至不会检查这一外部位置是否存在。这是一个非常重要的特性，因为这意味着你可以把创建数据推迟到创建表之后才进行。

　　区别：丢弃内部表时，这个表(包括它的元数据和数据)会被一起删除。丢弃外部表时，Hive 不会碰数据，只会删除元数据，而不会删除数据文件本身

　　5.表属性修改

　　##创建表log2CREATE external TABLE log2(id string COMMENT 'this is id column',phonenumber bigint,mac string,ip string,url string,status1 string,status2 string,up int,down int,code int,dt String)COMMENT 'this is log table' ##加入描述信息ROW FORMAT DELIMITED FIELDS TERMINATED BY ' 'LINES TERMINATED BY '\n'stored as textfile;##加载数据load local data inpath '/home/data.log.txt' into table log2;

　　修改表名：rename to

　　alter table原名rename to 新名

　　alter table log rename to log2;

　　修改列名：change column

　　alter table 表名 change column 字段名新字段名字段类型【描述信息】;

　　##修改列名alter table log4 change column ip myip String;##修改列名同时加入列的描述alter table log4 change column myip ip String comment 'this is mysip' ;##使用after关键字，将修改后的字段放在某个字段后alter table log4 change column myip ip String comment 'this is myip' after code;##使用first关键字。将修改的字段调整到第一个字段alter table log4 change column ip myip int comment 'this is myip' first;

　　添加列：add columns

　　##添加列，使用add columns,后面跟括号，括号里面加要加入的字段及字段描述，多个字段用逗号分开alter table log4 add columns(x int comment 'this x',y int);

　　删除列：

　　##删除列，使用replace columns,后面跟括号，括号里面加要删除的字段，多个字段用逗号分开alter table log4 replace columns(x int,y int);alter table log4 replace columns(myip int,id string, phonenumber bigint,mac string,url string,status1 string,status2 string,up int,down int, code int,dt string);

　　将内部表转换为外部表:

　　alter table log4 set tblproperties('EXTERNAL' = 'TRUE');alter table log4 set tblproperties('EXTERNAL' = 'false');alter table log4 set tblproperties('EXTERNAL' = 'FALSE');

好程序员大数据培训官网：http://www.goodprogrammer.org/

网友评论

好程序员大数据

本文标题：好程序员大数据学习路线分享hive的数据类型

本文链接：https://www.haomeiwen.com/subject/vskwdctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

好程序员大数据学习路线分享hive的数据类型

相关文章

好程序员大数据学习路线之hive表的查询

好程序员大数据学习路线之hive存储格式

好程序员大数据学习路线hive内部函数