一、存储Hive版本的元数据表
1.version:存储Hive版本
version表存hive的版本信息,该表中数据只有一条,如果存在多条,会造成hive启动不起来。
mysql> SELECT * FROM VERSION;
+--------+----------------+-----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+-----------------------------------------+
| 1 | 1.1.0 | Set by MetaStore hadoop@192.168.137.141 |
+--------+----------------+-----------------------------------------+
1 row in set (0.00 sec)
二、Hive数据库相关的元数据表
1.dbs:该表存储Hive中所有数据库的基本信息
mysql> select * from dbs;
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
| 1 | Default Hive database | hdfs://192.168.137.141:9000/user/hive/warehouse | default | public | ROLE |
| 9 | this is ruozedata 03 test database | hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db | hive2_ruozedata | hadoop | USER |
| 11 | NULL | hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db | ruozedata | hadoop | USER |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
//与hive中的database是一一对应的
hive> show databases;
OK
default
hive2_ruozedata
ruozedata
Time taken: 0.095 seconds, Fetched: 3 row(s)
2.database_params:该表存储数据库的相关参数,在CREATE DATABASE时候用

DBS和DATABASE_PARAMS这两张表通过DB_ID字段关联
WITH DBPROPERTIES (property_name=property_value, …)指定的参数
mysql> select * from database_params;
+-------+-----------+-------------+
| DB_ID | PARAM_KEY | PARAM_VALUE |
+-------+-----------+-------------+
| 9 | creator | ruoze |
| 9 | date | 2018-06-17 |
+-------+-----------+-------------+
//只有DB_ID=9的数据库有,因为当时创建数据库的语句为:
//create database hive2_ruozedata comment 'this is ruozedata 03 test database' with dbproperties('creator'='ruoze','date'='2018-06-17');
//hive中查看database的详细信息
hive> desc database extended hive2_ruozedata;
OK
hive2_ruozedata this is ruozedata 03 test database hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db hadoop USER {date=2018-06-17, creator=ruoze}
Time taken: 0.102 seconds, Fetched: 1 row(s)
三、Hive表和视图相关的元数据表
1.tbls:该表中存储Hive表、视图、索引表的基本信息
//查看DB_ID=1的数据库中的表,即default数据库
mysql> select * from tbls where DB_ID=1;
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| 1 | 1529135635 | 1 | 0 | hadoop | 0 | 1 | zh | MANAGED_TABLE | NULL | NULL |
| 31 | 1529287737 | 1 | 0 | hadoop | 0 | 41 | hive_wc | MANAGED_TABLE | NULL | NULL |
| 32 | 1529288829 | 1 | 0 | hadoop | 0 | 42 | rating_json | MANAGED_TABLE | NULL | NULL |
| 33 | 1529289493 | 1 | 0 | hadoop | 0 | 43 | hive_rownumber | MANAGED_TABLE | NULL | NULL |
| 76 | 1531056372 | 1 | 0 | hadoop | 0 | 91 | user_click_tmp1 | MANAGED_TABLE | NULL | NULL |
| 78 | 1531056697 | 1 | 0 | hadoop | 0 | 95 | user_click1 | MANAGED_TABLE | NULL | NULL |
| 81 | 1532851994 | 1 | 0 | hadoop | 0 | 101 | hive_array | MANAGED_TABLE | NULL | NULL |
| 83 | 1532855555 | 1 | 0 | hadoop | 0 | 103 | hive_map | MANAGED_TABLE | NULL | NULL |
| 85 | 1532856628 | 1 | 0 | hadoop | 0 | 105 | hive_struct | MANAGED_TABLE | NULL | NULL |
| 91 | 1535883337 | 1 | 0 | hadoop | 0 | 111 | ruoze_test | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
//hive中与之一一对应
hive> use default;
OK
Time taken: 0.08 seconds
hive> show tables;
OK
hive_array
hive_map
hive_rownumber
hive_struct
hive_wc
rating_json
ruoze_test
user_click1
user_click_tmp1
zh
Time taken: 0.064 seconds, Fetched: 10 row(s)
2.table_params:该表存储表/视图的属性信息
mysql> select * from table_params where TBL_ID=85;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+-------------+
| 85 | COLUMN_STATS_ACCURATE | true |
| 85 | numFiles | 1 |
| 85 | numRows | 0 |
| 85 | rawDataSize | 0 |
| 85 | totalSize | 91 |
| 85 | transient_lastDdlTime | 1532856636 |
+--------+-----------------------+-------------+
hive> desc formatted hive_struct;
OK
# col_name data_type comment
ip string
userinfo struct<name:string,age:int>
# Detailed Table Information
Database: default
Owner: hadoop
CreateTime: Sun Jul 29 17:30:28 CST 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://192.168.137.141:9000/user/hive/warehouse/hive_struct
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 0
rawDataSize 0
totalSize 91
transient_lastDdlTime 1532856636
Hive文件存储信息相关的元数据表主要涉及SDS、SD_PARAMS、SERDES、SERDE_PARAMS
由于HDFS支持的文件格式很多,而建Hive表时候也可以指定各种文件格式,Hive在将HQL解析成MapReduce时候,需要知道去哪里,使用哪种格式去读写HDFS文件,而这些信息就保存在这几张表中。
四、Hive文件存储信息相关的元数据表
1.sds:该表保存文件存储的基本信息,如INPUT_FORMAT、OUTPUT_FORMAT、是否压缩等。

mysql> select * from sds where SD_ID=43;
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| 43 | 33 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://192.168.137.141:9000/user/hive/warehouse/hive_rownumber | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 43 |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
1 row in set (0.10 sec)
2.SD_PARAMS:该表存储Hive存储的属性信息,在创建表时候使用STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定

3.SERDES:该表存储序列化使用的类信息

mysql> SELECT * FROM SERDES;
+----------+------+----------------------------------------------------+
| SERDE_ID | NAME | SLIB |
+----------+------+----------------------------------------------------+
| 1 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 11 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 12 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 13 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 14 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 17 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| 22 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
.......
+----------+------+----------------------------------------------------+
45 rows in set (0.00 sec)
4.SERDE_PARAMS:该表存储序列化的一些属性、格式信息,比如:行、列分隔符

mysql> SELECT * FROM SERDE_PARAMS;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY | PARAM_VALUE |
+----------+----------------------+-------------+
| 1 | serialization.format | 1 |
| 11 | field.delim | |
| 11 | serialization.format | |
| 12 | field.delim | |
| 12 | serialization.format | |
| 13 | serialization.format | 1 |
| 14 | field.delim | |
| 14 | serialization.format | |
| 17 | field.delim | |
| 17 | serialization.format | |
| 22 | field.delim | |
......
| 101 | colelction.delim | , |
| 101 | field.delim | |
| 101 | serialization.format | |
| 103 | colelction.delim | # |
| 103 | field.delim | , |
| 103 | mapkey.delim | : |
| 103 | serialization.format | , |
| 105 | colelction.delim | : |
| 105 | field.delim | # |
| 105 | serialization.format | # |
| 106 | field.delim | |
| 106 | serialization.format | |
| 111 | serialization.format | 1 |
| 116 | field.delim | , |
| 116 | serialization.format | , |
+----------+----------------------+-------------+
89 rows in set (0.00 sec)
五、Hive表字段相关的元数据表
1.columns_v2:该表存储表对应的字段信息
mysql> select * from columns_v2 where CD_ID=33;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 33 | NULL | age | int | 1 |
| 33 | NULL | id | int | 0 |
| 33 | NULL | name | string | 2 |
| 33 | NULL | sex | string | 3 |
+-------+---------+-------------+-----------+-------------+
4 rows in set (0.00 sec)
六、Hive表分区相关的元数据表
1.PARTITIONS:该表存储表分区的基本信息
mysql> select * from partitions where TBL_ID=29;
+---------+-------------+------------------+-----------------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------------------------------+-------+--------+
| 7 | 1529246035 | 0 | deptno=30 | 36 | 29 |
| 8 | 1529246036 | 0 | deptno=10 | 37 | 29 |
| 9 | 1529246036 | 0 | deptno=__HIVE_DEFAULT_PARTITION__ | 38 | 29 |
| 10 | 1529246037 | 0 | deptno=20 | 39 | 29 |
+---------+-------------+------------------+-----------------------------------+-------+--------+
4 rows in set (0.02 sec)
2.partition_keys:该表存储分区的字段信息

mysql> select * from partition_keys where TBL_ID=25;
+--------+--------------+-------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-------------+-----------+-------------+
| 25 | NULL | event_month | string | 0 |
+--------+--------------+-------------+-----------+-------------+
1 row in set (0.00 sec)
3.partition_key_vals:该表存储分区字段值

mysql> select * from partition_key_vals where part_id=7;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
| 7 | 30 | 0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)
4.PARTITION_PARAMS:该表存储分区的属性信息

mysql> select * from partition_params where part_id=7;
+---------+-----------------------+-------------+
| PART_ID | PARAM_KEY | PARAM_VALUE |
+---------+-----------------------+-------------+
| 7 | COLUMN_STATS_ACCURATE | true |
| 7 | numFiles | 1 |
| 7 | numRows | 6 |
| 7 | rawDataSize | 269 |
| 7 | totalSize | 275 |
| 7 | transient_lastDdlTime | 1529246039 |
+---------+-----------------------+-------------+
6 rows in set (0.00 sec)
网友评论