HIVE(1)

作者: 山猪打不过家猪 | 来源:发表于2023-02-17 05:20 被阅读0次

    1. The conception of a DATA WAREHOUSE

    1.1 ETL&ELT
    image.png
    • ETL: 1. extract 2. transform 3. load
    • ELT: 1. extract 2. load 3. transform
      Switching methods based on different contexts
    1.2 MT data warehouse
    image.png
    image.png

    2. Apache Hive

    2.1 what's hive
    1. HIVE is an open-source data warehouse built on top of Hadoop
    2. It can store structured and semi-structured data files and map them into a table.
    3. HIVEtranslates HQL to MapReduce, then submit to Hadoop
    4. HIVE stores data using HDFS and analyses data using MapReduce
    • the structure of HIVE


      image.png
    2.2 Table in HIVE
    2.2.1the storage path
    image.png
    ①If the table is in a default database, then the storage path in HDFS is /User/hive/warehouse/t_order; ②If the table is in custom databases, the path in HDFS is /User/hive/warehouse/itcast.db/address
    2.2.2 what's Partitions
    image.png
    • Partitions are a means of optimizing tables in HIVE
    2.2.3 What's buckets
    image.png
    • Buckets are a means of optimizing join in the hive.
    2.2.4 HIVE metadata
    1. The metadata stores databases, tables, location of tables, properties, type, and field order.
    2. The metadata must store in the relation database like Derby
    2.2.5 HIVE metastore
    image.png
    • Three models of the hive: local, remote, embedded


      image.png
    2.3 Hive install

    pass

    2. Hive SQL

    2.1 DDL creates table(IMPORTANT)
    image.png
    2.2 data types
    • primitive data types: numeric, string, time data, miscellaneous
    • complex data types: arrays, maps, structs, union
    2.3 example of creating primitive tables
    image.png
    image.png
    • upload archer.txt, build mapping. default path/usrs/hive/warehouse
    hadoop fs -put archer.txt /usrs/hive/warehouse/ithemia.db/t_archer
    
    2.4 example of creating complex tables
    image.png
    create table t_hot_hero_skin_price(
          id int,
          name string,
          win_rate int,
          skin_price map<string,int>
    )
    row format delimited delimiter
    fields terminated by ',' --字段之间分隔符
    collection items terminated by '-'  --集合元素之间分隔符
    map keys terminated by ':'; --集合元素kv之间分隔符;
    
    
    2.5 create a table with default
    image.png
    create table t_team_ace_player(
       id int,
       team_name string,
       ace_player_name string
    ); --没有指定row format语句 此时采用的是默认的\001作为字段的分隔符
    
    2.6 create a table in a specified path
    • 1.hadoop fs -mkdir /data
      1. create a table
    create table t_team_ace_player_location(
     id int,
     team_name string,
     ace_player_name string)
     location '/data'; --使用location关键字指定本张表数据在hdfs上的存储路径
    - 3.hadoop fs -put t_team_ace_player_location.txt /data
    
    
    2.7 create an internal table or an external table
    2.7.1 internal tables
    • internal tables are managed by hive all the time, so when dropping a table, the table will delete both in hdfs and hive. The default way to create tables is internal.
    create table student(
         num int,
         name string,
         sex string,
         age int,
         dept string)
    row format delimited
    fields terminated by ',';
    
    2.7.2 external tables
    • creating the external table is a way to protect metadata. When dropping a table, the table will delete in the hive, not in the hdfs.
    create external table student_ext(
       num int,
       name string,
       sex string,
       age int,
       dept string)
    row format delimited
    fields terminated by ','
    location '/stu';
    
    2.8 hive partitioned tables
    2.8.1 Before optimization
    image.png
      1. upload 6 files to Linux
      1. upload 6 files to HDFS
    hadoop fs -put archer.txt mage.txt tank.txt` /user/hive/warehouse/ithemia.db/t_all_hero
    
      1. create a table to connect the 6 files
    create table t_all_hero(
       id int,
       name string,
       hp_max int,
       mp_max int,
       attack_max int,
       defense_max int,
       attack_range string,
       role_main string,
       role_assist string
    )
    row format delimited
    fields terminated by "\t";
    
    • execute hive SQL
    --non-partitioned table, full table scan 
    select count(*) from t_all_hero where role_main="archer" and hp_max >6000;
    
    
    • it took 8 seconds
    2.8.2 After optimization
    • create a partition table
    --注意分区表创建语法规则
    --分区表建表
    create table t_all_hero_part(
       id int,
       name string,
       hp_max int,
       mp_max int,
       attack_max int,
       defense_max int,
       attack_range string,
       role_main string,
       role_assist string
    ) partitioned by (role string)--注意哦 这里是分区字段不能和里面字段重复
    row format delimited
    fields terminated by "\t";
    
    • load partitions
    load data local inpath '/root/hivedata/archer.txt' into table t_all_hero_part partition(role='sheshou');
    load data local inpath '/root/hivedata/assassin.txt' into table t_all_hero_part partition(role='cike');
    load data local inpath '/root/hivedata/mage.txt' into table t_all_hero_part partition(role='fashi');
    load data local inpath '/root/hivedata/support.txt' into table t_all_hero_part partition(role='fuzhu');
    load data local inpath '/root/hivedata/tank.txt' into table t_all_hero_part partition(role='tanke');
    load data local inpath '/root/hivedata/warrior.txt' into table t_all_hero_part partition(role='zhanshi');
    
    image.png
    • execute hivesql
    -- partitioned table
    select count(*) from t_all_hero_part where role="sheshou" and hp_max >6000;
    
    2.8.3 multiple partitioned tables

    we normally use two tables to create multiple partitioned

    --There is a progressive relationship between two fielders.
    create table t_user_province_city (id int, name string,age int) partitioned by (province string, city string);
    
    • load
    --双分区表的数据加载 静态分区加载数据
    load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
        partition(province='zhejiang',city='hangzhou');
    load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
        partition(province='zhejiang',city='ningbo');
    load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
        partition(province='shanghai',city='pudong');
    
    • execute a query
    select * from t_user_province_city where  province= "zhejiang" and city ="hangzhou";
    
    image.png

    相关文章

      网友评论

          本文标题:HIVE(1)

          本文链接:https://www.haomeiwen.com/subject/obsmkdtx.html