HIVE(1)

作者: 山猪打不过家猪 | 来源:发表于2023-02-17 05:20 被阅读0次

hive基础入门与环境的搭建
数仓实战03：数仓搭建-ODS 层
hive server
尚硅谷大数据技术之HBase
Hive开发环境搭建
Hive to SparkSQL 简单指南
Hive编程指南笔记整理
hive(一)：hive概述
21 sparkSQL
2016-05-04-Macys.com 面试题目

1. The conception of a DATA WAREHOUSE

1.1 ETL&ELT

image.png

ETL: 1. extract 2. transform 3. load
ELT: 1. extract 2. load 3. transform
Switching methods based on different contexts

1.2 MT data warehouse

image.png

2. Apache Hive

2.1 what's hive

HIVE is an open-source data warehouse built on top of Hadoop
It can store structured and semi-structured data files and map them into a table.
HIVEtranslates HQL to MapReduce, then submit to Hadoop
HIVE stores data using HDFS and analyses data using MapReduce

the structure of HIVE

image.png

2.2 Table in HIVE

2.2.1the storage path

image.png
①If the table is in a default database, then the storage path in HDFS is /User/hive/warehouse/t_order; ②If the table is in custom databases, the path in HDFS is /User/hive/warehouse/itcast.db/address

2.2.2 what's Partitions

image.png

Partitions are a means of optimizing tables in HIVE

2.2.3 What's buckets

image.png

Buckets are a means of optimizing join in the hive.

2.2.4 HIVE metadata

The metadata stores databases, tables, location of tables, properties, type, and field order.
The metadata must store in the relation database like Derby

2.2.5 HIVE metastore

image.png

Three models of the hive: local, remote, embedded

image.png

2.3 Hive install

pass

2. Hive SQL

2.1 DDL creates table(IMPORTANT)

image.png

2.2 data types

primitive data types: numeric, string, time data, miscellaneous
complex data types: arrays, maps, structs, union

2.3 example of creating primitive tables

image.png

upload archer.txt, build mapping. default path/usrs/hive/warehouse

hadoop fs -put archer.txt /usrs/hive/warehouse/ithemia.db/t_archer

2.4 example of creating complex tables

image.png

create table t_hot_hero_skin_price(
      id int,
      name string,
      win_rate int,
      skin_price map<string,int>
)
row format delimited delimiter
fields terminated by ',' --字段之间分隔符
collection items terminated by '-'  --集合元素之间分隔符
map keys terminated by ':'; --集合元素kv之间分隔符;

2.5 create a table with default

image.png

create table t_team_ace_player(
   id int,
   team_name string,
   ace_player_name string
); --没有指定row format语句 此时采用的是默认的\001作为字段的分隔符

2.6 create a table in a specified path

1.hadoop fs -mkdir /data
1. create a table

create table t_team_ace_player_location(
 id int,
 team_name string,
 ace_player_name string)
 location '/data'; --使用location关键字指定本张表数据在hdfs上的存储路径
- 3.hadoop fs -put t_team_ace_player_location.txt /data

2.7 create an internal table or an external table

2.7.1 internal tables

internal tables are managed by hive all the time, so when dropping a table, the table will delete both in hdfs and hive. The default way to create tables is internal.

create table student(
     num int,
     name string,
     sex string,
     age int,
     dept string)
row format delimited
fields terminated by ',';

2.7.2 external tables

creating the external table is a way to protect metadata. When dropping a table, the table will delete in the hive, not in the hdfs.

create external table student_ext(
   num int,
   name string,
   sex string,
   age int,
   dept string)
row format delimited
fields terminated by ','
location '/stu';

2.8 hive partitioned tables

2.8.1 Before optimization

image.png

1. upload 6 files to Linux
1. upload 6 files to HDFS

hadoop fs -put archer.txt mage.txt tank.txt` /user/hive/warehouse/ithemia.db/t_all_hero

1. create a table to connect the 6 files

create table t_all_hero(
   id int,
   name string,
   hp_max int,
   mp_max int,
   attack_max int,
   defense_max int,
   attack_range string,
   role_main string,
   role_assist string
)
row format delimited
fields terminated by "\t";

execute hive SQL

--non-partitioned table, full table scan 
select count(*) from t_all_hero where role_main="archer" and hp_max >6000;

it took 8 seconds

2.8.2 After optimization

create a partition table

--注意分区表创建语法规则
--分区表建表
create table t_all_hero_part(
   id int,
   name string,
   hp_max int,
   mp_max int,
   attack_max int,
   defense_max int,
   attack_range string,
   role_main string,
   role_assist string
) partitioned by (role string)--注意哦 这里是分区字段不能和里面字段重复
row format delimited
fields terminated by "\t";

load partitions

load data local inpath '/root/hivedata/archer.txt' into table t_all_hero_part partition(role='sheshou');
load data local inpath '/root/hivedata/assassin.txt' into table t_all_hero_part partition(role='cike');
load data local inpath '/root/hivedata/mage.txt' into table t_all_hero_part partition(role='fashi');
load data local inpath '/root/hivedata/support.txt' into table t_all_hero_part partition(role='fuzhu');
load data local inpath '/root/hivedata/tank.txt' into table t_all_hero_part partition(role='tanke');
load data local inpath '/root/hivedata/warrior.txt' into table t_all_hero_part partition(role='zhanshi');

image.png

execute hivesql

-- partitioned table
select count(*) from t_all_hero_part where role="sheshou" and hp_max >6000;

2.8.3 multiple partitioned tables

we normally use two tables to create multiple partitioned

--There is a progressive relationship between two fielders.
create table t_user_province_city (id int, name string,age int) partitioned by (province string, city string);

load

--双分区表的数据加载 静态分区加载数据
load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
    partition(province='zhejiang',city='hangzhou');
load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
    partition(province='zhejiang',city='ningbo');
load data local inpath '/root/hivedata/user.txt' into table t_user_province_city
    partition(province='shanghai',city='pudong');

execute a query

select * from t_user_province_city where  province= "zhejiang" and city ="hangzhou";

image.png

网友评论

本文标题：HIVE(1)

本文链接：https://www.haomeiwen.com/subject/obsmkdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

HIVE(1)

1. The conception of a DATA WAREHOUSE

1.1 ETL&ELT

1.2 MT data warehouse

2. Apache Hive

2.1 what's hive

2.2 Table in HIVE

2.2.1the storage path

2.2.2 what's Partitions

2.2.3 What's buckets

2.2.4 HIVE metadata

2.2.5 HIVE metastore

2.3 Hive install

2. Hive SQL

2.1 DDL creates table(IMPORTANT)

2.2 data types

2.3 example of creating primitive tables

2.4 example of creating complex tables

2.5 create a table with default

2.6 create a table in a specified path

2.7 create an internal table or an external table

2.7.1 internal tables

2.7.2 external tables

2.8 hive partitioned tables

2.8.1 Before optimization

2.8.2 After optimization

2.8.3 multiple partitioned tables

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读