美文网首页
Hive基础sql语法(DDL)

Hive基础sql语法(DDL)

作者: K_un | 来源:发表于2018-06-03 17:34 被阅读0次

    前言:

    • 经过前面的学习 我们了解到Hive可以使用关系型数据库来存储元数据,而且Hive提供了比较完整的SQL功能 ,这篇文章主要介绍Hive基本的sql语法。

    首先了解下Hive的数据存储结构,抽象图如下:


    Hive存储.png
    • 1.Database:Hive中包含了多个数据库,默认的数据库为default,对应于HDFS目录是/user/hadoop/hive/warehouse,可以通过hive.metastore.warehouse.dir参数进行配置(hive-site.xml中配置)
    • 2.Table: Hive 中的表又分为内部表和外部表 ,Hive 中的每张表对应于HDFS上的一个目录,HDFS目录为:/user/hadoop/hive/warehouse/[databasename.db]/table
    • 3.Partition:分区,每张表中可以加入一个分区或者多个,方便查询,提高效率;并且HDFS上会有对应的分区目录:
      /user/hadoop/hive/warehouse/[databasename.db]/table
    • 4.Bucket(桶):暂且不讲

    DDL操作(Data Definition Language)

    参考官方文档: DDL文档
    HiveQL DDL statements are documented here, including:

    • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
    • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
    • TRUNCATE TABLE
    • ALTER DATABASE/SCHEMA, TABLE, VIEW
    • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
    • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
    • DESCRIBE DATABASE/SCHEMA, table_name, view_name

    一.基于数据库的DDL操作

    1.创建数据库(Create Database)
    • 下面是官网上为我们列出的语法:
    Create Database
    CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
      [COMMENT database_comment]
      [LOCATION hdfs_path]
      [WITH DBPROPERTIES (property_name=property_value, ...)];
    

    IF NOT EXISTS:加上这句话代表判断数据库是否存在,不存在就会创建,存在就不会创建(生产环境建议使用)。
    COMMENT:数据库的描述
    LOCATION:创建数据库的地址,不加默认在/user/hive/warehouse/路径下
    WITH DBPROPERTIES:数据库的属性

    hive> CREATE DATABASE hive1;
    OK
    hive> CREATE DATABASE IF NOT EXISTS hive2
        > COMMENT "this is ruoze database"
        > WITH DBPROPERTIES ("creator"="ruoze", "date"="2018-08-08");
    OK
    hive> CREATE DATABASE hive3 LOCATION '/db_hive3';
    OK
    hive> show databases;
    OK
    default
    hive1
    hive2
    hive3
    
    # 在HDFS中查看数据库文件夹
    [hadoop@hadoop000 ~]$ hadoop fs -ls /user/hive/warehouse
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:26 /user/hive/warehouse/hive1.db
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:28 /user/hive/warehouse/hive2.db
    [hadoop@hadoop000 ~]$ hadoop fs -ls /
    Found 3 items
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:29 /db_hive3
    drwx-wx-wx   - hadoop supergroup          0 2018-06-03 15:57 /tmp
    drwxr-xr-x   - hadoop supergroup          0 2018-06-03 16:43 /user
    
    # 在RDBMS中查看数据库相关信息
    mysql> select * from hive_meta.dbs\G;
    *************************** 1. row ***************************
              DB_ID: 1
               DESC: Default Hive database
    DB_LOCATION_URI: hdfs://hadoop000:9000/user/hive/warehouse
               NAME: default
         OWNER_NAME: public
         OWNER_TYPE: ROLE
    *************************** 2. row ***************************
              DB_ID: 6
               DESC: NULL
    DB_LOCATION_URI: hdfs://hadoop000:9000/user/hive/warehouse/hive1.db
               NAME: hive1
         OWNER_NAME: hadoop
         OWNER_TYPE: USER
    *************************** 3. row ***************************
              DB_ID: 7
               DESC: this is ruoze database
    DB_LOCATION_URI: hdfs://hadoop000:9000/user/hive/warehouse/hive2.db
               NAME: hive2
         OWNER_NAME: hadoop
         OWNER_TYPE: USER
    *************************** 4. row ***************************
              DB_ID: 8
               DESC: NULL
    DB_LOCATION_URI: hdfs://hadoop000:9000/db_hive3
               NAME: hive3
         OWNER_NAME: hadoop
         OWNER_TYPE: USER
    4 rows in set (0.00 sec)
    
    2.查询数据库(Show Databases)
    • 下面是官网上为我们列出的语法:
    SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
    
    hive> show databases;
    OK
    default
    hive1
    hive2
    hive3
    Time taken: 0.047 seconds, Fetched: 4 row(s)
    hive> show databases like 'hive1';
    OK
    hive1
    Time taken: 0.035 seconds, Fetched: 1 row(s)
    hive> show databases like 'hive*';
    OK
    hive1
    hive2
    hive3
    Time taken: 0.037 seconds, Fetched: 3 row(s)
    
    3.查询数据库信息(Describe Database)
    • 下面是官网上为我们列出的语法:
    DESCRIBE DATABASE [EXTENDED] db_name;
    --describe 可简写为desc
    

    DESCRIBE DATABASE db_name:查看数据库的描述信息和文件目录位置路径信息;
    EXTENDED:加上数据库键值对的属性信息。

    hive> desc database hive1;
    OK
    hive1           hdfs://192.168.6.217:9000/user/hive/warehouse/hive1.db  hadoop  USER
    Time taken: 0.039 seconds, Fetched: 1 row(s)
    hive>  desc database hive2;
    OK
    hive2   this is ruoze database  hdfs://192.168.6.217:9000/user/hive/warehouse/hive2.db  hadoop  USER
    Time taken: 0.041 seconds, Fetched: 1 row(s)
    hive>  desc database hive3;
    OK
    hive3           hdfs://192.168.6.217:9000/db_hive3      hadoop  USER
    Time taken: 0.046 seconds, Fetched: 1 row(s)
    hive>  desc database extended  hive2;
    OK
    hive2   this is ruoze database  hdfs://192.168.6.217:9000/user/hive/warehouse/hive2.db  hadoop  USER    {date=2018-08-08, creator=ruoze}
    Time taken: 0.031 seconds, Fetched: 1 row(s)
    
    4.删除数据库(Drop Database)
    • 下面是官网上为我们列出的语法:
    DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
    

    RESTRICT:默认是restrict,如果该数据库还有表存在则报错;
    CASCADE:级联删除数据库(当数据库还有表时,级联删除表后再删除数据库) --生产尽量不用。

    hive> drop database test;
    OK
    Time taken: 0.094 seconds
    
    5.修改数据库信息(Alter Database)
    • 下面是官网上为我们列出的语法:
    ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
    
    ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;  
     -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
    
    ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; 
    -- (Note: Hive 2.2.1, 2.4.0 and later)
    

    (Note:表示对于版本进行的修改)

    hive> alter database hive2 set dbproperties ("update"="jepson");
    OK
    Time taken: 0.094 seconds
    hive> alter database hive2 set owner user hive;
    OK
    Time taken: 0.072 seconds
    
    # 修改前
    hive>  desc database extended  hive2;
    OK
    hive2   this is ruoze database  hdfs://192.168.6.217:9000/user/hive/warehouse/hive2.db  hadoop  USER    {date=2018-08-08, creator=ruoze}
    Time taken: 0.031 seconds, Fetched: 1 row(s)
    # 修改后
    hive> desc database extended  hive2;
    OK
    hive2   this is ruoze database  hdfs://192.168.6.217:9000/user/hive/warehouse/hive2.db  hive    USER    {update=jepson, date=2018-08-08, creator=ruoze}
    Time taken: 0.034 seconds, Fetched: 1 row(s)
    
    6.切换数据库(Use Database)
    • 下面是官网上为我们列出的语法:
    USE database_name;
    
    hive> use hive1;
    OK
    Time taken: 0.044 seconds
    hive> use default;
    OK
    Time taken: 0.047 seconds
    

    二.基于表的DDL操作

    1.创建表(Create Table)
    • 下面是官网上为我们列出的语法:
    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
      [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
      [COMMENT table_comment]
      [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
      [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
      [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
         ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
         [STORED AS DIRECTORIES]
      [
       [ROW FORMAT row_format] 
       [STORED AS file_format]
         | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
      ]
      [LOCATION hdfs_path]
      [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
      [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
    
    1.1.TEMPORARY(临时表)
    • Hive从0.14.0开始提供创建临时表的功能,表只对当前session有效,session退出后,表自动删除。

    语法:

    CREATE TEMPORARY TABLE ...
    

    注意点:

    1. 如果创建的临时表表名已存在,那么当前session引用到该表名时实际用的是临时表,只有drop或rename临时表名才能使用原始表;
    2. 临时表限制:不支持分区字段和创建索引。
    hive> use default;
    OK
    Time taken: 0.047 seconds
    hive> CREATE TEMPORARY TABLE temporary_table (
        > id int,
        > name string);
    OK
    Time taken: 0.242 seconds
    hive> show tables;
    OK
    temporary_table
    Time taken: 0.044 seconds, Fetched: 1 row(s)
    
    # 退出重新进
    hive> use default;
    OK
    Time taken: 1.054 seconds
    hive> show tables;
    OK
    Time taken: 0.559 seconds
    
    1.2.Managed and External Tables(内部表和外部表)
    • Hive上有两种类型的表,一种是Managed Table(默认的),另一种是External Table(加上EXTERNAL关键字)。它俩的主要区别在于:当我们drop表时,Managed Table会同时删去data(存储在HDFS上)和meta data(存储在MySQL),而External Table只会删meta data。
    hive> use default;
    OK
    Time taken: 1.054 seconds
    hive> show tables;
    OK
    Time taken: 0.559 seconds
    # 创建内部表和外部表
    hive> create table managed_table(
        > id int,
        > name string 
        > );
    OK
    Time taken: 0.677 seconds
    hive> create external table external_table(
        > id int,
        > name string 
        > );
    OK
    Time taken: 0.146 seconds
    hive> show tables;
    OK
    external_table
    managed_table
    Time taken: 0.05 seconds, Fetched: 2 row(s)
    # HDFS中查看
    [hadoop@hadoop000 ~]$ hadoop fs -ls /user/hive/warehouse
    Found 4 items
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:40 /user/hive/warehouse/external_table
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:26 /user/hive/warehouse/hive1.db
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:28 /user/hive/warehouse/hive2.db
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:39 /user/hive/warehouse/managed_table
    # MySQL中查看
    mysql> select * from hive_meta.tbls\G;
    *************************** 1. row ***************************
                TBL_ID: 11
           CREATE_TIME: 1529138399
                 DB_ID: 1
      LAST_ACCESS_TIME: 0
                 OWNER: hadoop
             RETENTION: 0
                 SD_ID: 11
              TBL_NAME: managed_table
              TBL_TYPE: MANAGED_TABLE
    VIEW_EXPANDED_TEXT: NULL
    VIEW_ORIGINAL_TEXT: NULL
    *************************** 2. row ***************************
                TBL_ID: 12
           CREATE_TIME: 1529138409
                 DB_ID: 1
      LAST_ACCESS_TIME: 0
                 OWNER: hadoop
             RETENTION: 0
                 SD_ID: 12
              TBL_NAME: external_table
              TBL_TYPE: EXTERNAL_TABLE
    VIEW_EXPANDED_TEXT: NULL
    VIEW_ORIGINAL_TEXT: NULL
    2 rows in set (0.00 sec)
    
    # 删除内部表和外部表
    hive> drop table managed_table;
    OK
    Time taken: 1.143 seconds
    hive> drop table external_table;
    OK
    Time taken: 0.265 seconds
    # 再次查看
    [hadoop@hadoop000 ~]$ hadoop fs -ls /user/hive/warehouse
    Found 3 items
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 16:40 /user/hive/warehouse/external_table
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:26 /user/hive/warehouse/hive1.db
    drwxr-xr-x   - hadoop supergroup          0 2018-06-16 15:28 /user/hive/warehouse/hive2.db
    mysql> select * from hive_meta.tbls\G;
    Empty set (0.00 sec)
    
    ERROR: 
    No query specified
    
    1.3.COMMENT,ROW FORMAT等其他建表参数

    COMMENT :注释 可以给字段和表加注释

    先看看官网对于ROW FORMAT的描述

    : DELIMITED 
    [FIELDS TERMINATED BY char [ESCAPED BY char]]       [COLLECTION ITEMS TERMINATED BY char]
    [MAP KEYS TERMINATED BY char] 
    [LINES TERMINATED BY char]
    [NULL DEFINED AS char]   
    -- (Note: Available in Hive 0.13 and later)
      | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
    
    • 先看看官网给我们的解释:用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的 SerDe,Hive 通过 SerDe 确定表的具体的列的数据。

    • 那么问题又来了上面这句话又是什么意思呢?
      让我们来一起看看到底是神马东东:
      DELIMITED:分隔符(可以自定义分隔符);
      FIELDS TERMINATED BY char:每个字段之间使用的分割;
      例:-FIELDS TERMINATED BY '\n' 字段之间的分隔符为\n;
      COLLECTION ITEMS TERMINATED BY char:集合中元素与元素(array)之间使用的分隔符(collection单例集合的跟接口);
      MAP KEYS TERMINATED BY char:字段是K-V形式指定的分隔符;
      LINES TERMINATED BY char:每条数据之间由换行符分割(默认[ \n ])。

    • 一般情况下LINES TERMINATED BY char我们就使用默认的换行符\n,只需要指定FIELDS TERMINATED BY char。

    hive> CREATE TABLE hive_test
        > (id int comment 'this is id', name string comment 'this is name' )
        > comment 'this is hive_test'
        > ROW FORMAT DELIMITED 
        > FIELDS TERMINATED BY '\t' ;
    OK
    Time taken: 0.174 seconds
    
    #为了后面的测试我们创建一张emp表 并导入一些数据
    hive> create table emp 
        > (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)
        > ROW FORMAT DELIMITED 
        > FIELDS TERMINATED BY '\t' ;
    OK
    Time taken: 0.651 seconds
    hive> LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp; 
    Loading data to table default.emp
    Table default.emp stats: [numFiles=1, numRows=0, totalSize=886, rawDataSize=0]
    OK
    Time taken: 1.848 seconds
    
    1.4.Create Table As Select (CTAS)
    • 创建表(拷贝表结构及数据,并且会运行MapReduce作业)
    # 复制整张表
    hive> create table emp2 as select * from emp;
    Query ID = hadoop_20180616171313_fbc318e8-bc70-4b63-84fa-3acd94e4ec3e
    Total jobs = 3
    ...
    OK
    Time taken: 23.279 seconds
    hive> select * from emp2;
    OK
    7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
    7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
    7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
    7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
    7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
    7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
    7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
    7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
    7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
    7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
    7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
    7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
    7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
    7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
    Time taken: 0.138 seconds, Fetched: 14 row(s)
    
    #复制表中的一些字段
    hive> create table emp3 as select empno,ename from emp;
    Query ID = hadoop_20180616171313_fbc318e8-bc70-4b63-84fa-3acd94e4ec3e
    Total jobs = 3
    ...
    OK
    Time taken: 16.143 seconds
    hive> select * from emp3;
    OK
    7369    SMITH
    7499    ALLEN
    7521    WARD
    7566    JONES
    7654    MARTIN
    7698    BLAKE
    7782    CLARK
    7788    SCOTT
    7839    KING
    7844    TURNER
    7876    ADAMS
    7900    JAMES
    7902    FORD
    7934    MILLER
    Time taken: 0.159 seconds, Fetched: 14 row(s)
    
    1.5.Create Table Like
    # Create Table Like 只拷贝表结构
    hive> create table emp_like like emp;
    OK
    Time taken: 0.195 seconds
    hive> select * from emp_like;
    OK
    Time taken: 0.131 seconds
    
    2.展示表 (Show Table与Show Create Table)
    • 下面是官网上为我们列出的语法:
    SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
    SHOW CREATE TABLE ([db_name.]table_name|view_name);
    
    hive> show tables;
    OK
    emp
    emp2
    emp3
    emp_like
    hive_test
    Time taken: 0.042 seconds, Fetched: 5 row(s)
    hive> show tables 'emp*';
    OK
    emp
    emp2
    emp3
    emp_like
    Time taken: 0.053 seconds, Fetched: 4 row(s)
    hive> show create table emp;
    OK
    CREATE TABLE `emp`(
      `empno` int, 
      `ename` string, 
      `job` string, 
      `mgr` int, 
      `hiredate` string, 
      `salary` double, 
      `comm` double, 
      `deptno` int)
    ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '\t' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://192.168.6.217:9000/user/hive/warehouse/emp'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='true', 
      'numFiles'='1', 
      'numRows'='0', 
      'rawDataSize'='0', 
      'totalSize'='657', 
      'transient_lastDdlTime'='1529140756')
    Time taken: 0.245 seconds, Fetched: 24 row(s)
    
    3.查询表信息(Describe Table)
    • 下面是官网上为我们列出的语法:
    DESCRIBE [EXTENDED|FORMATTED] 
      table_name[.col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];
                                            -- (Note: Hive 1.x.x and 0.x.x only. See "Hive 2.0+: New Syntax" below)
    

    desc formatted table_name; 比较常用

    hive> desc emp;
    OK
    empno                   int                                         
    ename                   string                                      
    job                     string                                      
    mgr                     int                                         
    hiredate                string                                      
    salary                  double                                      
    comm                    double                                      
    deptno                  int                                         
    Time taken: 0.213 seconds, Fetched: 8 row(s)
    hive> desc formatted emp;
    OK
    # col_name              data_type               comment             
                     
    empno                   int                                         
    ename                   string                                      
    job                     string                                      
    mgr                     int                                         
    hiredate                string                                      
    salary                  double                                      
    comm                    double                                      
    deptno                  int                                         
                     
    # Detailed Table Information             
    Database:               default                  
    Owner:                  hadoop                   
    CreateTime:             Sat Jun 16 17:13:05 CST 2018     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Retention:              0                        
    Location:               hdfs://192.168.6.217:9000/user/hive/warehouse/emp        
    Table Type:             MANAGED_TABLE            
    Table Parameters:                
            COLUMN_STATS_ACCURATE   true                
            numFiles                1                   
            numRows                 0                   
            rawDataSize             0                   
            totalSize               657                 
            transient_lastDdlTime   1529140756          
                     
    # Storage Information            
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:             
            field.delim             \t                  
            serialization.format    \t                  
    Time taken: 0.214 seconds, Fetched: 39 row(s)
    hive> desc EXTENDED emp;
    OK
    empno                   int                                         
    ename                   string                                      
    job                     string                                      
    mgr                     int                                         
    hiredate                string                                      
    salary                  double                                      
    comm                    double                                      
    deptno                  int                                         
                     
    Detailed Table Information      Table(tableName:emp, dbName:default, owner:hadoop, createTime:1529140385, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:empno, type:int, comment:null), FieldSchema(name:ename, type:string, comment:null), FieldSchema(name:job, type:string, comment:null), FieldSchema(name:mgr, type:int, comment:null), FieldSchema(name:hiredate, type:string, comment:null), FieldSchema(name:salary, type:double, comment:null), FieldSchema(name:comm, type:double, comment:null), FieldSchema(name:deptno, type:int, comment:null)], location:hdfs://192.168.6.217:9000/user/hive/warehouse/emp, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=       , field.delim=
    Time taken: 0.21 seconds, Fetched: 10 row(s)
    
    4.修改表(Alter Table)
    • 下面是官网上为我们列出的语法:
    ALTER TABLE table_name RENAME TO new_table_name;
    ALTER TABLE table_name SET TBLPROPERTIES table_properties;
    table_properties:
      : (property_name = property_value, property_name = property_value, ... )
    ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
    ...
    
    hive> alter table hive_test rename to new_hive_test;
    OK
    Time taken: 0.262 seconds
    hive> ALTER TABLE table_name SET TBLPROPERTIES ("creator"="ruoze", "date"="2018-06-16");
    FAILED: SemanticException [Error 10001]: Table not found default.table_name
    hive> ALTER TABLE new_hive_test SET TBLPROPERTIES ("creator"="ruoze", "date"="2018-06-16");
    OK
    Time taken: 0.246 seconds
    hive> ALTER TABLE new_hive_test SET TBLPROPERTIES ('comment' = 'This is new_hive_test Table');
    # 再次查看表
    hive> desc formatted new_hive_test;
    OK
    # col_name              data_type               comment             
                     
    id                      int                     this is id          
    name                    string                  this is name        
                     
    # Detailed Table Information             
    Database:               default                  
    Owner:                  hadoop                   
    CreateTime:             Sat Jun 16 17:09:19 CST 2018     
    LastAccessTime:         UNKNOWN                  
    Protect Mode:           None                     
    Retention:              0                        
    Location:               hdfs://192.168.6.217:9000/user/hive/warehouse/new_hive_test      
    Table Type:             MANAGED_TABLE            
    Table Parameters:                
            COLUMN_STATS_ACCURATE   false               
            comment                 This is new_hive_test Table
            creator                 ruoze               
            date                    2018-06-16          
            last_modified_by        hadoop              
            last_modified_time      1529143021          
            numFiles                0                   
            numRows                 -1                  
            rawDataSize             -1                  
            totalSize               0                   
            transient_lastDdlTime   1529143021          
                     
    # Storage Information            
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:             
            field.delim             \t                  
            serialization.format    \t                  
    Time taken: 0.188 seconds, Fetched: 38 row(s)
    
    5.截断表(Truncate Table)
    • 下面是官网上为我们列出的语法:
    TRUNCATE TABLE table_name [PARTITION partition_spec];
    partition_spec:
      : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
    

    Truncate Table用处不多

    hive> select * from emp3;
    OK
    7369    SMITH
    7499    ALLEN
    7521    WARD
    7566    JONES
    7654    MARTIN
    7698    BLAKE
    7782    CLARK
    7788    SCOTT
    7839    KING
    7844    TURNER
    7876    ADAMS
    7900    JAMES
    7902    FORD
    7934    MILLER
    Time taken: 0.148 seconds, Fetched: 14 row(s)
    hive> truncate table emp3;
    OK
    Time taken: 0.241 seconds
    hive> select * from emp3;
    OK
    Time taken: 0.12 seconds
    
    6.删除表(Drop Table)
    • 下面是官网上为我们列出的语法:
    DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)
    

    1.指定PURGE后,数据不会放到回收箱,会直接删除。
    2.DROP TABLE删除此表的元数据和数据。如果配置了垃圾箱(并且未指定PURGE),则实际将数据移至.Trash / Current目录。元数据完全丢失。
    3.删除EXTERNAL表时,表中的数据不会从文件系统中删除。

    hive> drop table emp3;
    OK
    Time taken: 0.866 seconds
    hive> show tables;
    OK
    emp
    emp2
    emp_like
    new_hive_test
    Time taken: 0.036 seconds, Fetched: 4 row(s)
    

    参考:https://blog.csdn.net/yu0_zhang0/article/details/78976021
    关于表的DDL操作还有很多,有关分区表的操作还没详解 后面会单独写一篇分区表

    相关文章

      网友评论

          本文标题:Hive基础sql语法(DDL)

          本文链接:https://www.haomeiwen.com/subject/apgjsftx.html