美文网首页python数据分析人工智能机器学习软件测试Python专家之路程序员
HIVE快速入门教程4Hive数据类型和创建,删除数据库

HIVE快速入门教程4Hive数据类型和创建,删除数据库

作者: python测试开发 | 来源:发表于2019-06-11 14:19 被阅读15次

    Hive数据类型和创建,删除数据库

    Hive中的数据类型

    数据类型是Hive查询语言和数据建模中非常重要的元素。 要定义表列类型,我们必须了解数据类型及其用法。

    以下简要概述了Hive中的一些数据类型:

    这些是

    • 数值类型:Numeric
    • 字符串类型:String
    • 日期/时间类型:Date/Time
    • 复杂类型:Complex
    Type Memory allocation 例子
    TINY INT 它的1字节有符号整数(-128到127) 10Y
    SMALL INT 2字节有符号整数(-32768到32767) 10S
    INT 4字节有符号整数(-2,147,484,648到2,147,484,647) 10
    BIG INT 8字节有符号整数 100L
    FLOAT 4字节单精度浮点数 1.2345679
    DOUBLE 8字节双精度浮点数 1.2345678901234567
    DECIMAL We can define precision and scale in this Type

    String Types:

    Type Length 例子
    CHAR 255 'US' or "US"
    VARCHAR 1 to 65355
    STRING 我们可以在这里定义长度(无限制) "Books" or 'Books'
    BINARY 只能和STRING互相CAST 1011
    BOOLEAN TRUE or FALSE TRUE

    Date/Time Types:

    Type Usage 例子
    Timestamp 支持具有可选纳秒精度的传统Unix时间戳 2019-01-01
    Date 它采用YYYY-MM-DD格式。Date类型支持的值范围是0000-01-01到9999-12-31,具体取决于原始Java Date类型的支持 2019-01-01 12:00:01.345

    Complex Types:

    Type Usage 例子
    Arrays ARRAY<data_type>不允许使用负值和非常量表达式 [ "apple","orange","mango" ]
    Maps MAP<primitive_type, data_type> 不允许使用负值和非常量表达式 {1: "apple",2: "orange"}
    Struct STRUCT<col_name :datat_type, ….. > {1, "apple"}
    NAMED Struct STRUCT<col_name :datat_type, ….. > {"apple":"gala","weight

    kg":1}
    | Union | UNIONTYPE<data_type, datat_type, ……> 不常用 | {2:["apple","orange"]}

    参考资料

    在Hive中创建和删除数据库:

    创建数据库:

    要在Hive shell中创建数据库,我们必须使用如下语法所示的命令: -

    句法:

    Create database <DatabaseName>

    示例: create database guru99

    类似的有drop,此处的语法和mysql及其相似。

    hive> show databases;
    OK
    default
    Time taken: 2.723 seconds, Fetched: 1 row(s)
    hive> create database guru99
        > ;
    OK
    Time taken: 0.656 seconds
    hive> show databases;
    OK
    default
    guru99
    Time taken: 0.068 seconds, Fetched: 2 row(s)
    hive> drop database guru99;
    OK
    Time taken: 0.848 seconds
    hive> show databases;
    OK
    default
    Time taken: 0.063 seconds, Fetched: 1 row(s)
    hive> 
    

    数据类型演示实例

    employee.txt

    Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer�Lead
    Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
    Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
    Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
    

    Row Delimiter: This can be used with Ctrl + A or ^ A (use \001 when creating the table)
    Collection Item Delimiter: This can be used with Ctrl + B or ^ B (\002)
    Map Key Delimiter: This can be used with Ctrl + C or ^ C (\003)

    If the delimiter is overridden during the table creation, it only works when used in the flat structure. This is still a limitation in Hive described in Apache Jira Hive-365 ( https://issues.apach e.org/jira/browse/HIVE-365 ). For nested types, the level of nesting determines the delimiter. Using ARRAY of ARRAY as an example, the delimiters for the outer ARRAY , as expected, are Ctrl + B characters, but the inner ARRAY delimiter becomes Ctrl + C characters, which is the next delimiter in the list. In the preceding example, the depart_title column, which is a MAP of ARRAY , the MAP key delimiter is Ctrl + C, and the ARRAY delimiter is Ctrl + D.

    执行

        > CREATE TABLE employee (name STRING, work_place ARRAY<STRING>, gender_age STRUCT<gender:STRING,age:INT>, skills_score MAP<STRING,INT>, depart_title MAP<STRING,ARRAY<STRING>>)
        > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' STORED AS TEXTFILE;
    OK
    > LOAD DATA INPATH '/user/hduser/employee.txt' OVERWRITE INTO TABLE employee;
    Loading data to table default.employee
    OK
    Time taken: 2.602 seconds
    hive> SELECT work_place FROM employee;
    OK
    ["Montreal","Toronto"]
    ["Montreal"]
    ["New York"]
    ["Vancouver"]
    Time taken: 5.808 seconds, Fetched: 4 row(s)
    hive> SELECT work_place[0] as col_1, work_place[1] as col_2, work_place[2] as col_3 FROM employee;
    OK
    Montreal    Toronto NULL
    Montreal    NULL    NULL
    New York    NULL    NULL
    Vancouver   NULL    NULL
    Time taken: 1.376 seconds, Fetched: 4 row(s)
    hive> 
        > SELECT gender_age FROM employee;
    OK
    {"gender":"Male","age":30}
    {"gender":"Male","age":35}
    {"gender":"Female","age":27}
    {"gender":"Female","age":57}
    Time taken: 0.399 seconds, Fetched: 4 row(s)
    hive> 
        > SELECT gender_age.gender, gender_age.age FROM employee;
    OK
    Male    30
    Male    35
    Female  27
    Female  57
    Time taken: 0.369 seconds, Fetched: 4 row(s)
    hive> 
        > SELECT skills_score FROM employee;
    OK
    {"DB":80}
    {"Perl":85}
    {"Python":80}
    {"Sales":89,"HR":94}
    Time taken: 0.347 seconds, Fetched: 4 row(s)
    hive> 
        > SELECT skills_score FROM employee;
    OK
    {"DB":80}
    {"Perl":85}
    {"Python":80}
    {"Sales":89,"HR":94}
    Time taken: 0.382 seconds, Fetched: 4 row(s)
    hive> 
        > SELECT name, skills_score['DB'] as DB, skills_score['Perl'] as Perl, skills_score['Python'] as Python, skills_score['Sales'] as Sales, skills_score['HR'] as HR FROM employee;
    OK
    Michael 80  NULL    NULL    NULL    NULL
    Will    NULL    85  NULL    NULL    NULL
    Shelley NULL    NULL    80  NULL    NULL
    Lucy    NULL    NULL    NULL    89  94
    Time taken: 0.447 seconds, Fetched: 4 row(s)
    hive> 
        > 
        > SELECT depart_title FROM employee;
    OK
    {"Product":["Developer","Lead"]}
    {"Product":["Lead"],"Test":["Lead"]}
    {"Test":["Lead"],"COE":["Architect"]}
    {"Sales":["Lead"]}
    Time taken: 0.329 seconds, Fetched: 4 row(s)
    hive> 
        > 
        > SELECT name, depart_title['Product'] as Product, depart_title['Test'] as Test, depart_title['COE'] as COE, depart_title['Sales'] as Sales FROM employee;
    OK
    Michael ["Developer","Lead"]    NULL    NULL    NULL
    Will    ["Lead"]    ["Lead"]    NULL    NULL
    Shelley NULL    ["Lead"]    ["Architect"]   NULL
    Lucy    NULL    NULL    NULL    ["Lead"]
    Time taken: 0.322 seconds, Fetched: 4 row(s)
    hive> SELECT name, depart_title['Product'][0] as product_col0, depart_title['Test'][0] as test_col0 FROM employee;
    OK
    Michael Developer   NULL
    Will    Lead    Lead
    Shelley NULL    Lead
    Lucy    NULL    NULL
    Time taken: 0.335 seconds, Fetched: 4 row(s)
    

    类型转换

    从窄类型到更宽类型的原始类型转换称为隐式转换。但是,不允许进行反向转换。所有整数数字类型FLOAT和STRING都可以隐式转换DOUBLE,TINYINT,SMALLINT和INT都可以转换为FLOAT。 BOOLEAN类型无法转换为任何其他类型。更多参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

    显式类型转换使用CAST函数和CAST(值为TYPE)语法。例如,CAST('100' as INT)将100字符串转换为100整数值。如果强制转换失败,例如CAST('INT' as
    INT),则该函数返回NULL。

    此外,BINARY类型只能首先转换为STRING,然后根据需要从STRING转换为其他类型。

    更多数据库操作

    hive> 
        > 
        > CREATE DATABASE myhivebook;
    OK
    Time taken: 0.323 seconds
    hive> CREATE SCHEMA IF NOT EXISTS myhivebook;
    OK
    Time taken: 0.07 seconds
    hive> CREATE DATABASE IF NOT EXISTS myhivebook COMMENT 'hive database demo' LOCATION '/hdfs/directory' WITH DBPROPERTIES ('creator'='dayongd','date'='2018-05-01');
    OK
    Time taken: 0.05 seconds
    hive> SHOW CREATE DATABASE default;
    OK
    CREATE DATABASE `default`
    COMMENT
      'Default Hive database'
    LOCATION
      'hdfs://localhost:54310/user/hive/warehouse'
    Time taken: 0.07 seconds, Fetched: 5 row(s)
    hive> SHOW DATABASES;
    OK
    default
    myhivebook
    Time taken: 0.055 seconds, Fetched: 2 row(s)
    hive> SHOW DATABASES LIKE 'my.*';
    OK
    myhivebook
    Time taken: 0.165 seconds, Fetched: 1 row(s)
    hive> USE myhivebook;
    OK
    Time taken: 0.059 seconds
    hive> SELECT current_database();
    OK
    myhivebook
    Time taken: 0.244 seconds, Fetched: 1 row(s)
    hive> DROP DATABASE IF EXISTS myhivebook;
    OK
    Time taken: 0.492 seconds
    hive> DROP DATABASE IF EXISTS myhivebook CASCADE;
    OK
    Time taken: 0.042 seconds
    hive> ALTER DATABASE default SET DBPROPERTIES ('edited-by'='Dayong');
    
    hive> ALTER DATABASE default SET OWNER user dayongd;
    
    hive> ALTER DATABASE default SET LOCATION '/tmp/data/default';
    

    从Hive v2.2.1开始,ALTER DATABASE ... SET LOCATION语句可用于修改数据库的位置,但它不会将当前数据库目录中的所有现有表/分区移动到新指定的位置。它只会在更改数据库后更改新添加的表的位置。此行为类似于更改表目录不会将现有分区移动到其他位置的方式。
    Hive中的SHOW和DESC(或DESCRIBE)语句用于显示大多数对象的定义,例如表和分区。 SHOW语句支持各种Hiveobject,例如表,表的属性,表DDL,索引,分区,列,函数,锁,角色,配置,事务和压缩。 DESC语句支持少量Hive对象,例如数据库,表,视图,列和分区。
    但是,DESC声明能够提供与EXTENDED或FORMATTED关键字相结合的更详细信息。

    相关文章

      网友评论

        本文标题:HIVE快速入门教程4Hive数据类型和创建,删除数据库

        本文链接:https://www.haomeiwen.com/subject/mocqfctx.html