从头到尾，建一个kylin多维分析Cube

作者: 万州客 | 来源:发表于2022-05-12 07:51 被阅读0次

从头到尾，建一个kylin多维分析Cube
Kylin系列（二）—— Cube 构造算法
kylin-kylin平台PushDown引擎调研配置
kylin-Cube中的Rowkey优化
Kylin在滴滴OLAP引擎中的应用
Kylin系列（一）—— 入门
kyin-Cube中对Rowkey的编码方式介绍
Rest接口建立Model以及Cube
Kylin系列5-Kylin每日自动构建Cube
kylin 介绍

实践到第三次了，作一下比较完整的记录。

一，启动kylin测试的docker，保证Hive命令能正常执行

 docker run -d --name kylin -p 7070:7070 -p 8088:8088 -p 50070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0

kylin的这个标准版镜像里，带了hive的，方便些。

docker exec -it kylin bash
[root@ab0f5be3a766 admin]# hive
ls: cannot access /home/admin/spark-2.3.1-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory

Logging initialized using configuration in jar:file:/home/admin/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>

二，建好hive的表格，并导入测试数据

相关命令和导入数据如下

#建库建表
create database kylin_flat_db;


create table kylin_flat_db.web_access_fact_tbl
     (
         day           date,
         cookieid      string,
         regionid      string,
         cityid        string,
         siteid        string,
         os            string,
         pv            bigint
     ) row format delimited
     fields terminated by '|' stored as textfile;
     
create table kylin_flat_db.region_tbl
     (
     regionid       string,
     regionname     string
     ) row format delimited
     fields terminated by '|' stored as textfile;
     
create table kylin_flat_db.city_tbl
     (
     regionid     string,
     cityid       string,
     cityname     string
     ) row format delimited
     fields terminated by '|' stored as textfile;

导入数据

use kylin_flat_db;

load data local inpath '/opt/fact_data.txt' into table web_access_fact_tbl;

load data local inpath '/opt/city.txt' into table city_tbl;

load data local inpath '/opt/region.txt' into table region_tbl;

其中fact_data.txt

2016-07-19|GBSYO1IMQ7GHQXOVTP|G03|G0302|810|Mac OS|2
2016-07-03|NTJ95UHFUD3UECNS0U|G05|G0502|3171|Mac OS|4
2016-07-20|ZR27L7C79CCJGTN1F7|G04|G0402|8793|Mac OS|2
2016-07-01|C17QEB0560LUZHD26P|G04|G0402|9793|Android 5.0|5
2016-07-01|N9LRCVTU6PGSUDJ9RB|G03|G0301|1292|Mac OS|1

city.txt

G01|G0101|朝阳
G01|G0102|海淀
G02|G0201|南京
G02|G0202|宿迁
G03|G0301|杭州
G03|G0302|嘉兴
G04|G0401|徐汇
G04|G0402|虹口
G05|G0501|广州
G05|G0502|海珠

region.txt

G01|北京
G02|江苏
G03|浙江
G04|上海
G05|广州

在运行load data命令之前，记得把三个数据文导，docker cp到容器的指定位置。

[root@127 hive]# docker cp fact_data.txt kylin:/opt/
[root@127 hive]# docker cp city.txt kylin:/opt/
[root@127 hive]# docker cp region.txt kylin:/opt/

跑个测试查询

select "DAY",regionname,cityname,sum(pv),count(distinct cookieid) 
      from WEB_ACCESS_FACT_TBL a
      left join CITY_TBL b
      on a.cityid = b.cityid
      left join REGION_TBL c
      on c.regionid = a.regionid
      group by "DAY", regionname, cityname;
======================
No Stats for kylin_flat_db@web_access_fact_tbl, Columns: regionid, pv, cookieid, cityid
No Stats for kylin_flat_db@city_tbl, Columns: cityname, cityid
No Stats for kylin_flat_db@region_tbl, Columns: regionid, regionname
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20.log
2022-05-11 00:03:00     Starting to launch local task to process map join;      maximum memory = 477626368
2022-05-11 00:03:01     Dump the side-table for tag: 1 with group count: 6 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2022-05-11 00:03:01     Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (424 bytes)
2022-05-11 00:03:01     Dump the side-table for tag: 1 with group count: 11 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2022-05-11 00:03:01     Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (591 bytes)
2022-05-11 00:03:01     End of local task; Time Taken: 1.031 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2022-05-11 00:03:04,496 Stage-3 map = 100%,  reduce = 100%
Ended Job = job_local1516279064_0001
MapReduce Jobs Launched: 
Stage-Stage-3:  HDFS Read: 550 HDFS Write: 1004 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
DAY     NULL    NULL    NULL    0
DAY     浙江    嘉兴    2       1
DAY     浙江    杭州    1       1
DAY     广州    海珠    4       1
DAY     上海    虹口    7       2
Time taken: 11.095 seconds, Fetched: 5 row(s)

三，在kylin的WEB UI里新建项目，导入刚刚生成的Hive数据源。

 docker run --name kylin -p 7070:7070 -p 8088:8088 -p 60070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0

2022-05-11 08_05_28-悬浮球.png

2022-05-11 08_30_24-MessageCenterUI.png

2022-05-11 08_30_59-MessageCenterUI.png

四，建立数据模型

模型名称

2022-05-11 08_33_02-MessageCenterUI.png

定义事实表和维度表及关联

2022-05-11 08_41_11-MessageCenterUI.png

2022-05-11 08_41_24-MessageCenterUI.png
字义维度字段和度量数据字段

2022-05-11 08_43_20-MessageCenterUI.png

2022-05-11 08_44_12-MessageCenterUI.png

确认日期格式之后，保证模型(那个filter字段pv的，可写可不写)

2022-05-11 08_48_47-MessageCenterUI.png

2022-05-11 09_02_20-Kylin.png
完成之后，可以检查一下，走走过场

2022-05-11 09_04_39-MessageCenterUI.png

2022-05-11 09_04_21-MessageCenterUI.png

五，训练CUBE

选择模型，命名CUBE

2022-05-11 09_06_11-MessageCenterUI.png

训练CUBE时的维度信息确认（两种类型哟）

2022-05-11 09_08_38-MessageCenterUI.png
确定CUBE的度量值

2022-05-11 09_12_55-MessageCenterUI.png

2022-05-11 09_12_05-MessageCenterUI.png

后面的设置，保持默认，保持即可。

2022-05-11 09_16_41-MessageCenterUI.png

2022-05-11 09_16_14-MessageCenterUI.png

2022-05-11 09_14_19-MessageCenterUI.png

六，提交训练

确认没问题，就一把梭哈吧。

2022-05-11 09_18_20-MessageCenterUI.png

2022-05-11 09_21_33-MessageCenterUI.png

2022-05-11 09_22_20-MessageCenterUI.png

七，测试查询的速度

网友评论

本文标题：从头到尾，建一个kylin多维分析Cube

本文链接：https://www.haomeiwen.com/subject/hafdurtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

从头到尾，建一个kylin多维分析Cube

一，启动kylin测试的docker，保证Hive命令能正常执行

二，建好hive的表格，并导入测试数据

三，在kylin的WEB UI里新建项目，导入刚刚生成的Hive数据源。

四，建立数据模型

五，训练CUBE

六，提交训练

七，测试查询的速度

相关文章

从头到尾，建一个kylin多维分析Cube

Kylin系列（二）—— Cube 构造算法

kylin-kylin平台PushDown引擎调研配置

kylin-Cube中的Rowkey优化

Kylin在滴滴OLAP引擎中的应用

Kylin系列（一）—— 入门

kyin-Cube中对Rowkey的编码方式介绍

Rest接口建立Model以及Cube

Kylin系列5-Kylin每日自动构建Cube

kylin 介绍

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读