美文网首页
Hive学习笔记(6)DDL

Hive学习笔记(6)DDL

作者: 井地儿 | 来源:发表于2019-04-03 16:10 被阅读0次

    官方参考文档 LanguageManual DDL

    创建/删除/更改/使用数据库

    在hive sql中database关键词和 schema关键词可以互换,意思是一样的

    创建数据库

    CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
    [COMMENT database_comment]
    [LOCATION hdfs_path]
    [WITH DBPROPERTIES (property_name=property_value, ...)];

    小牛试刀:
    hive> create database if not exists test;
    OK
    Time taken: 0.131 seconds
    hive> create schema if not exists test_schema;
    OK
    Time taken: 0.07 seconds
    hive> show databases;
    OK
    default
    test
    test_schema
    Time taken: 0.255 seconds, Fetched: 3 row(s)
    hive> create database if not exists stefan comment 'just for test';
    OK
    Time taken: 0.047 seconds
    hive> desc database stefan;
    OK
    stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db hadoop  USER
    Time taken: 0.068 seconds, Fetched: 1 row(s)
    

    删除数据库

    删除数据库时,默认行为是RESTRICT,这种情况下如果数据库不为空,则删除动作失败。如果需要删除库以及库里的表可以使用CASCADE关键字。

    DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

    小牛试刀:
    hive> drop database test;
    OK
    Time taken: 0.242 seconds
    hive> drop database if exists test;
    OK
    Time taken: 0.011 seconds
    hive> drop database if exists test_schema cascade;
    OK
    Time taken: 1.518 seconds
    hive> show databases;
    OK
    default
    prop_database
    stefan
    
    

    修改数据库

    需要注意下述命令的生效版本号。

    ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)

    ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

    ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

    alter database ... set location 语句不会将数据库当前目录的内容移动到新指定的位置。它不会更改当前数据库下已经存在的任何表/分区关联的位置,仅改变在该数据库下新建表的默认父目录。

    小牛试刀:

    添加属性:

    hive> show create database stefan;
    OK
    CREATE DATABASE `stefan`
    COMMENT
      'just for test'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
    Time taken: 0.222 seconds, Fetched: 5 row(s)
    hive> alter database stefan set dbproperties ('owner'='stefan', 'create_time'='20190403');
    OK
    Time taken: 0.07 seconds
    hive> show create database stefan;
    OK
    CREATE DATABASE `stefan`
    COMMENT
      'just for test'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
    WITH DBPROPERTIES (
      'create_time'='20190403',
      'owner'='stefan')
    Time taken: 0.196 seconds, Fetched: 8 row(s)
    

    修改owner:

    hive> alter schema stefan set owner user stefan_test;
    OK
    Time taken: 0.084 seconds
    hive> desc database stefan;
    OK
    stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db stefan_test USER
    Time taken: 0.145 seconds, Fetched: 1 row(s)
    

    切换数据库

    USE database_name;
    USE DEFAULT;

    hive> use stefan;
    OK
    Time taken: 0.019 seconds
    

    查看库

    show databases;

    hive> show databases;
    OK
    default
    Time taken: 5.987 seconds, Fetched: 1 row(s)
    

    查看当前库

    show current_database();

    hive> select current_database();
    OK
    default
    Time taken: 0.852 seconds, Fetched: 1 row(s)
    

    查看库的属性信息

    desc database extended default;

    hive> desc database extended default;
    OK
    default Default Hive database   hdfs://jms-master-01:9000/user/hive/warehouse   public  ROLE
    Time taken: 0.361 seconds, Fetched: 1 row(s)
    
    

    查看建库语句

    show create database default;

    hive> show create database default;
    OK
    CREATE DATABASE `default`
    COMMENT
      'Default Hive database'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse'
    Time taken: 0.525 seconds, Fetched: 5 row(s)
    

    建库时携带属性

    create database if not exists prop_database comment 'test properties' with dbproperties ('prop1'='aaa', 'prop2'='bbb');

    hive> create database if not exists prop_database comment 'test properties database' with dbproperties('prop1'='aaa', 'prop2'='bbb');
    OK
    Time taken: 0.121 seconds
    hive> show create database prop_database;
    OK
    CREATE DATABASE `prop_database`
    COMMENT
      'test properties database'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/prop_database.db'
    WITH DBPROPERTIES (
      'prop1'='aaa',
      'prop2'='bbb')
    Time taken: 0.181 seconds, Fetched: 8 row(s)
    
    

    创建/删除/清空表

    建表

    根据指定表明创建表。当表明存在时抛出异常,可以使用 IF NOT EXISTS 跳过该异常。
    *表和列的注释comment格式是字符串,使用单引号。
    *表名和列名不区分大小写。
    *使用external创建的表是外部表,不使用默认创建的表是托管表(我习惯称之为内部表)。查看一个表是托管表还是外部表,可以通过describe extended table_name;查看其中的tableType属性值。

    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
    [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
    [COMMENT table_comment]
    [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
    [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
    [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
    ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
    [STORED AS DIRECTORIES]
    [
    [ROW FORMAT row_format]
    [STORED AS file_format]
    | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
    ]
    [LOCATION hdfs_path]
    [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
    [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
    LIKE existing_table_or_view_name
    [LOCATION hdfs_path];

    data_type
    : primitive_type
    | array_type
    | map_type
    | struct_type
    | union_type -- (Note: Available in Hive 0.7.0 and later)

    primitive_type
    : TINYINT
    | SMALLINT
    | INT
    | BIGINT
    | BOOLEAN
    | FLOAT
    | DOUBLE
    | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
    | STRING
    | BINARY -- (Note: Available in Hive 0.8.0 and later)
    | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
    | DECIMAL -- (Note: Available in Hive 0.11.0 and later)
    | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
    | DATE -- (Note: Available in Hive 0.12.0 and later)
    | VARCHAR -- (Note: Available in Hive 0.12.0 and later)
    | CHAR -- (Note: Available in Hive 0.13.0 and later)

    array_type
    : ARRAY < data_type >

    map_type
    : MAP < primitive_type, data_type >

    struct_type
    : STRUCT < col_name : data_type [COMMENT col_comment], ...>

    union_type
    : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)

    row_format
    : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
    [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
    [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
    | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

    file_format:
    : SEQUENCEFILE
    | TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
    | RCFILE -- (Note: Available in Hive 0.6.0 and later)
    | ORC -- (Note: Available in Hive 0.11.0 and later)
    | PARQUET -- (Note: Available in Hive 0.13.0 and later)
    | AVRO -- (Note: Available in Hive 0.14.0 and later)
    | JSONFILE -- (Note: Available in Hive 4.0.0 and later)
    | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

    constraint_specification:
    : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

    创建托管表:
    hive> create table manage_table (id int, name string);
    OK
    Time taken: 0.674 seconds
    

    创建外部表:

    hive> create external table external_table (id int, name string)  location '/user/hadoop/tmp/hive/default.db/external_table';
    OK
    Time taken: 0.449 seconds
    

    查看表是托管表还是外部表:
    可以看出托管表的tableType是MANAGED_TABLE;外部表的tableType是EXTERNAL_TABLE。

    hive> describe extended manage_table;
    OK
    id                      int
    name                    string
    
    Detailed Table Information  Table(tableName:manage_table, dbName:default, owner:hadoop, createTime:1554341982, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hive/warehouse/manage_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1554341982}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)
    Time taken: 0.329 seconds, Fetched: 4 row(s)
    hive> describe extended external_table;
    OK
    id                      int
    name                    string
    
    Detailed Table Information  Table(tableName:external_table, dbName:default, owner:hadoop, createTime:1554342083, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hadoop/tmp/hive/default.db/external_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1554342083}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, rewriteEnabled:false)
    Time taken: 0.143 seconds, Fetched: 4 row(s)
    

    创建分区表

    hive> create table partition_table (id int, name string) partitioned by (dt string);
    OK
    Time taken: 0.393 seconds
    hive> show create table partition_table;
    OK
    CREATE TABLE `partition_table`(
      `id` int,
      `name` string)
    PARTITIONED BY (
      `dt` string)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/partition_table'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1554344052')
    Time taken: 0.275 seconds, Fetched: 15 row(s)
    

    create table as select(CTAS)

    CTAS语法限制:

    • 目标表不能是外部表
    • 目标表不能是分桶表
      CTAS语法可以实现表的格式转换。
    hive> select * from partition_table;
    OK
    1   郭靖  20190402    china
    2   黄蓉  20190402    china
    3   杨康  20190402    china
    4   穆念慈 20190402    china
    5   东邪  20190402    china
    6   西毒  20190402    china
    7   黄老邪 20190402    china
    8   杨铁心 20190402    china
    Time taken: 1.499 seconds, Fetched: 8 row(s)
    hive> create table new_key_value_store row format serde "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" stored as RCFile as select (id * 10) new_key, concat(id, name) key_value_pair from partition_table sort by new_key, key_value_pair;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = hadoop_20190404103533_902fb48c-2471-4e2d-bc79-5cba46b8c710
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1552651623473_0007, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0007/
    Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0007
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2019-04-04 10:35:45,795 Stage-1 map = 0%,  reduce = 0%
    2019-04-04 10:35:50,537 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.39 sec
    2019-04-04 10:35:56,978 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.27 sec
    MapReduce Total cumulative CPU time: 5 seconds 270 msec
    Ended Job = job_1552651623473_0007
    Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.27 sec   HDFS Read: 8487 HDFS Write: 248 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 270 msec
    OK
    Time taken: 24.897 seconds
    hive> select * from  new_key_value_store;
    OK
    10  1郭靖
    20  2黄蓉
    30  3杨康
    40  4穆念慈
    50  5东邪
    60  6西毒
    70  7黄老邪
    80  8杨铁心
    Time taken: 0.21 seconds, Fetched: 8 row(s)
    hive> show create table new_key_value_store;
    OK
    CREATE TABLE `new_key_value_store`(
      `new_key` int,
      `key_value_pair` string)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1554345358')
    Time taken: 0.299 seconds, Fetched: 13 row(s)
    

    克隆表(create table ... like)

    create table ... like 会创建一个和源表结构完全一致的空表。

    hive> create table like_new_key_value_store like new_key_value_store;
    OK
    Time taken: 0.149 seconds
    hive> show create table like_new_key_value_store;
    OK
    CREATE TABLE `like_new_key_value_store`(
      `new_key` int,
      `key_value_pair` string)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/like_new_key_value_store'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1554345937')
    Time taken: 1.801 seconds, Fetched: 13 row(s)
    hive> select * from like_new_key_value_store;
    OK
    Time taken: 0.252 seconds
    

    分桶排序表

    创建一个分桶排序表
    hive> create table bucketed_table (id int, name string) partitioned by (dt string, country string) clustered by (id) sorted by (name) into 4 buckets row format delimited fields terminated by '\001' collection items terminated by '\002' map keys terminated by '\003' stored as sequencefile;
    OK
    Time taken: 0.241 seconds
    hive> show create table bucketed_table;
    OK
    CREATE TABLE `bucketed_table`(
      `id` int,
      `name` string)
    PARTITIONED BY (
      `dt` string,
      `country` string)
    CLUSTERED BY (
      id)
    SORTED BY (
      name ASC)
    INTO 4 BUCKETS
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'colelction.delim'='',
      'field.delim'='',
      'mapkey.delim'='',
      'serialization.format'='')
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.SequenceFileInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
    LOCATION
      'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/bucketed_table'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1554346570')
    Time taken: 0.268 seconds, Fetched: 26 row(s)
    

    向多个分区多动态插入,需要设置参数
    set hive.exec.dynamici.partition=true; #开启动态分区,默认是false
    set hive.exec.dynamic.partition.mode=nonstrict; #开启允许所有分区都是动态的,否则必须要有静态分区才能使用。

    hive> insert into bucketed_table partition(dt, country) select id, name, dt, country from default.partition_table;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = hadoop_20190404110332_1e7c3bf3-e5e6-4ff1-af30-2a5bcf0278e6
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 4
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1552651623473_0008, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0008/
    Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0008
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
    2019-04-04 11:03:38,849 Stage-1 map = 0%,  reduce = 0%
    2019-04-04 11:03:43,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.47 sec
    2019-04-04 11:03:48,331 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 4.05 sec
    2019-04-04 11:03:50,399 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 9.22 sec
    2019-04-04 11:03:51,435 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12.12 sec
    MapReduce Total cumulative CPU time: 12 seconds 120 msec
    Ended Job = job_1552651623473_0008
    Loading data to table stefan.bucketed_table partition (dt=null, country=null)
    
    Loaded : 1/1 partitions.
         Time taken to load dynamic partitions: 0.243 seconds
         Time taken for adding to write entity : 0.0 seconds
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1  Reduce: 4   Cumulative CPU: 12.12 sec   HDFS Read: 21718 HDFS Write: 937 SUCCESS
    Total MapReduce CPU Time Spent: 12 seconds 120 msec
    OK
    Time taken: 20.806 seconds
    hive> select * from bucketed_table;
    OK
    8   杨铁心 20190402    china
    4   穆念慈 20190402    china
    5   东邪  20190402    china
    1   郭靖  20190402    china
    6   西毒  20190402    china
    2   黄蓉  20190402    china
    3   杨康  20190402    china
    7   黄老邪 20190402    china
    Time taken: 0.225 seconds, Fetched: 8 row(s)
    

    数据倾斜表(Skewed Tables)

    暂时没接触过也没研究过。

    临时表

    临时表只在当前会话可见。数据存储在用户的临时目录总,会话结束时删除。

    • 临时表不支持分区,不支持创建索引。
    • 如果临时表表明和永久表名冲突,则在当前会话中会屏蔽永久表;只有drop或rename临时表表名才能使用永久表。
    • 从hive1.1+版本起,临时表可以存储在内存或者SSD中,通过参数hive.exec.temporary.table.storage配置,可选值有memory,ssd,default。

    create temporary table ...

    小牛试刀
    hive> create temporary table temp_table(id int, name string);
    OK
    Time taken: 6.14 seconds
    hive> insert into temp_table select id, name from partition_table;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = hadoop_20190408171623_2c787f42-e5e5-460d-8a21-90a5a5c9afc3
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1552651623473_0010, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0010/
    Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0010
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2019-04-08 17:16:32,625 Stage-1 map = 0%,  reduce = 0%
    2019-04-08 17:16:38,994 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.1 sec
    MapReduce Total cumulative CPU time: 2 seconds 100 msec
    Ended Job = job_1552651623473_0010
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to directory hdfs://jms-master-01:9000/tmp/hive-hadoop/hadoop/8dbf32dc-51a3-4caa-b5be-9d682ccdb29c/_tmp_space.db/09e10ccb-5d5f-45ab-b07a-afe5590b2352/.hive-staging_hive_2019-04-08_17-16-23_750_5995661068890136721-1/-ext-10000
    Loading data to table default.temp_table
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1   Cumulative CPU: 2.1 sec   HDFS Read: 4605 HDFS Write: 155 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 100 msec
    OK
    Time taken: 16.73 seconds
    hive> select * from temp_table;
    OK
    1   郭靖
    2   黄蓉
    3   杨康
    4   穆念慈
    5   东邪
    6   西毒
    7   黄老邪
    8   杨铁心
    Time taken: 0.161 seconds, Fetched: 8 row(s)
    

    临时表插入数据不支持insert overwrite语法。

    hive> insert overwrite temp_table select id, name from partition_table;
    NoViableAltException(24@[])
        at org.apache.hadoop.hive.ql.parse.HiveParser.destination(HiveParser.java:38762)
        at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:38531)
        at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36478)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
        at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
        at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
        at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
        at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
    FAILED: ParseException line 1:17 cannot recognize input near 'temp_table' 'select' 'id' in destination specification
    

    退出当前会话,临时表会被删除。

    hive> quit;
    [hadoop@jms-master-01 ~]$ hive
    which: no hbase in (/usr/share/svensudo/bin:/home/hadoop/tools/scala-2.12.8/bin:/home/hadoop/tools/spark-2.4.0-bin-hadoop2.7/bin:/home/hadoop/tools/java/jdk1.8.0_191/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin/:/alc/tool/bin:/alc/tool/bin:/home/hadoop/tools/hadoop-2.7.7/bin:/home/hadoop/tools/apache-hive-2.3.4-bin/bin:/home/hadoop/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in file:/home/hadoop/tools/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> select * from temp_table;
    FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'temp_table'
    

    事务表(Transactional Tables)

    从Hive1.4.0+版本开始支持事务表。Hive提供了支持ACID语义的事务表,可以完成增删改操作。
    关于事务表,展开又是一个专题了。可以参考官方事务表文档

    主键约束

    从Hive2.1.0版本开始,支持约束。Hive支持未经验证的主键和外键约束,注意是未经验证的,所以主键的正确性需要上游数据来保证。

    删除表

    drop table [if exists] table_name [purge];
    通常情况下,删除托管表,hive会删除元数据信息和数据文件,删除文件相当于执行了hadoop fs -rm,也就是说数据文件会被移动到hdfs系统下是trash回收站中,在误操作的情况下,在一定时效期间还可以找回数据。如果指定了purge选项,则不会移动到回收站下,而是直接永久删除。所以drop动作一定要谨慎。

    Truncate表

    truncate table table_name [partition partition_spec]
    这里partition_spec指(partition_column=partition_col_value,partition_column=partition_col_value, ...)

    相信如果对关系型数据库sql语言比较熟悉的话,以上操作都可以理解。

    Alter table/partition/column

    Alter table
    重命名表

    alter table table_name rename to new_table_name;

    hive> show tables;
    OK
    first_table
    Time taken: 5.51 seconds, Fetched: 9 row(s)
    hive> alter table first_table rename to second_table;
    OK
    Time taken: 0.406 seconds
    hive> show tables;
    OK
    second_table
    Time taken: 0.05 seconds, Fetched: 9 row(s)
    
    修改表的属性

    alter table table_name set tblproperties table_properties;
    table_properties:
    (property_name = property_value, property_name = property_value, ...)

    相关文章

      网友评论

          本文标题:Hive学习笔记(6)DDL

          本文链接:https://www.haomeiwen.com/subject/svtabqtx.html