Hive版本:1.1.0-cdh5.7.0
一、VERSION表
mysql> select * from VERSION ;
+--------+----------------+-----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+-----------------------------------------+
| 1 | 1.1.0 | Set by MetaStore hadoop@192.168.245.100 |
+--------+----------------+-----------------------------------------+
此表存着Hive的版本信息,有且只有一条数据。可以尝试删除此条信息和新增一条信息,都会导致Hive不可用
二 、数据库有关的元数据表
1、DBS表存储Hive中数据库的信息,default 为自带数据,HDFS路径为/user/hive/warehouse。g6_hadoop 为自建数据,HDFS路径为/user/hive/warehouse/g6_hadoop.db
mysql> select * from DBS ;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+
| 1 | Default Hive database | hdfs://hadoop000:8020/user/hive/warehouse | default | public | ROLE |
| 2 | NULL | hdfs://hadoop000:8020/user/hive/warehouse/g6_hadoop.db | g6_hadoop | hadoop | USER |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+
2、DATABASE_PARAMS存储数据的属性信息,与DBS关联主键为DB_ID
mysql> desc DATABASE_PARAMS;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| DB_ID | bigint(20) | NO | PRI | NULL | |
| PARAM_KEY | varchar(180) | NO | PRI | NULL | |
| PARAM_VALUE | varchar(4000) | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+
三、与数据表有关的元数据表
1、TBLS存储表的信息
mysql> select * from TBLS \G;
*************************** 1. row ***************************
TBL_ID: 11
CREATE_TIME: 1554218873
DB_ID: 2
LAST_ACCESS_TIME: 0
OWNER: hadoop
RETENTION: 0
SD_ID: 16
TBL_NAME: g6_access
TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
TBL_ID: 16
CREATE_TIME: 1555690888
DB_ID: 2
LAST_ACCESS_TIME: 0
OWNER: hue
RETENTION: 0
SD_ID: 26
TBL_NAME: views
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
与DBS关联的主键为DB_ID,与SDS关联的主键为SD_ID。
2、SDS存储的是表的输入和输出格式,也就是存储格式
mysql> select * from SDS where SD_ID =16 \G;
*************************** 1. row ***************************
SD_ID: 16
CD_ID: 11
INPUT_FORMAT: org.apache.hadoop.mapred.TextInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: hdfs://hadoop000:8020/g6/hadoop/access/clear
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
SERDE_ID: 16
SDS与CDS关联的主键为CD_ID,与SERDES(存储表使用的序列化类的信息)和SERDE_PARAMS(存储表使用的序列化的一些属性、格式信息,比如:行、列分隔符)关联的主键为SERDE_ID。
3、SERDES存储表使用的序列化类的信息
SERDE_PARAMS存储表使用的序列化的一些属性、格式信息,比如:行、列分隔符
mysql> select * from SERDES where SERDE_ID=16;
+----------+------+----------------------------------------------------+
| SERDE_ID | NAME | SLIB |
+----------+------+----------------------------------------------------+
| 16 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+----------+------+----------------------------------------------------+
mysql> select * from SERDE_PARAMS where SERDE_ID=16;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY | PARAM_VALUE |
+----------+----------------------+-------------+
| 16 | field.delim | |
| 16 | serialization.format | |
+----------+----------------------+-------------+
4、TABLE_PARAMS该表存储表/视图的属性信息
mysql> select * from TABLE_PARAMS where TBL_ID = 11;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+-------------+
| 11 | EXTERNAL | TRUE |
| 11 | transient_lastDdlTime | 1554218873 |
+--------+-----------------------+-------------+
与TBLS关联主键为TBL_ID
5、TBL_PRIVS该表存储表/视图的授权信息,与TBLS关联主键为TBL_ID
6、COLUMNS_V2该表存储表对应的字段信息
mysql> select * from COLUMNS_V2 where CD_ID = 11;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 11 | NULL | cdn | string | 0 |
| 11 | NULL | domain | string | 5 |
| 11 | NULL | ip | string | 4 |
| 11 | NULL | level | string | 2 |
| 11 | NULL | region | string | 1 |
| 11 | NULL | time | string | 3 |
| 11 | NULL | traffic | bigint | 7 |
| 11 | NULL | url | string | 6 |
+-------+---------+-------------+-----------+-------------+
COLUMNS_V2要先通过SDS表CD_ID关联取到SD_ID,再用SD_ID与TBLS关联。
Hive元数据常用表的UML图如下:
image.png
网友评论