美文网首页
Hive - Tutorial

Hive - Tutorial

作者: 左心Chris | 来源:发表于2019-08-14 22:16 被阅读0次

    hive 官方文档
    https://cwiki.apache.org/confluence/display/Hive/Home
    hive Tutorial
    https://cwiki.apache.org/confluence/display/Hive/Tutorial

    1 Concepts

    1.1 What is hive and set up & book

    Using for ad-hoc querying instead of online transaction processing
    Set up : the GettingStarted guide.
    Book: Books about Hive lists some books that may also be helpful for getting started with Hive.

    1.2 Data Units

    Databases
    Tables
    Partitions:分区表,比如db/dt=20190814和db/dt=20190813
    Buckets:分桶,比如db/dt=20190814/part-...
    具体概念
    https://www.jianshu.com/p/dd97e0b2d2cf
    简单来说就是外部表存储在hdfs自定义位置,删除表不会删除hdfs数据

    1.3 Type System

    Primitive type
    Complex type
    Time stamp

    1.4 Built In Operators and Functions

    Operators
    Functions
    Language Capabilities

    2 Usage and Examples

    Creating, Showing, Altering, and Dropping Tables

    Creating Tables

    Browsing Tables and Partitions

    动态分区文档https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts

    Dynamic partition insert could potentially be a resource hog in that it could generate a large number of partitions in a short time. To get yourself buckled, we define three parameters:
    hive.exec.max.dynamic.partitions.pernode (default value being 100) is the maximum dynamic partitions that can be created by each mapper or reducer. If one mapper or reducer created more than that the threshold, a fatal error will be raised from the mapper/reducer (through counter) and the whole job will be killed.

    • hive.exec.max.dynamic.partitions (default value being 1000) is the total number of dynamic partitions could be created by one DML. If each mapper/reducer did not exceed the limit but the total number of dynamic partitions does, then an exception is raised at the end of the job before the intermediate data are moved to the final destination.
    • hive.exec.max.created.files (default value being 100000) is the maximum total number of files created by all mappers and reducers. This is implemented by updating a Hadoop counter by each mapper/reducer whenever a new file is created. If the total number is exceeding
    • hive.exec.max.created.files, a fatal error will be thrown and the job will be killed.

    Altering Tables

    Dropping Tables and Partitions

    hive syntax https://gist.github.com/kzhangkzhang/258d18858889fa97194011a249b74c43

    hive 教学https://www.shiyanlou.com/courses/38/learning/?id=772

    相关文章

      网友评论

          本文标题:Hive - Tutorial

          本文链接:https://www.haomeiwen.com/subject/tatojctx.html