实践到第三次了,作一下比较完整的记录。
一,启动kylin测试的docker,保证Hive命令能正常执行
docker run -d --name kylin -p 7070:7070 -p 8088:8088 -p 50070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0
kylin的这个标准版镜像里,带了hive的,方便些。
docker exec -it kylin bash
[root@ab0f5be3a766 admin]# hive
ls: cannot access /home/admin/spark-2.3.1-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory
Logging initialized using configuration in jar:file:/home/admin/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>
二,建好hive的表格,并导入测试数据
相关命令和导入数据如下
#建库建表
create database kylin_flat_db;
create table kylin_flat_db.web_access_fact_tbl
(
day date,
cookieid string,
regionid string,
cityid string,
siteid string,
os string,
pv bigint
) row format delimited
fields terminated by '|' stored as textfile;
create table kylin_flat_db.region_tbl
(
regionid string,
regionname string
) row format delimited
fields terminated by '|' stored as textfile;
create table kylin_flat_db.city_tbl
(
regionid string,
cityid string,
cityname string
) row format delimited
fields terminated by '|' stored as textfile;
导入数据
use kylin_flat_db;
load data local inpath '/opt/fact_data.txt' into table web_access_fact_tbl;
load data local inpath '/opt/city.txt' into table city_tbl;
load data local inpath '/opt/region.txt' into table region_tbl;
其中fact_data.txt
2016-07-19|GBSYO1IMQ7GHQXOVTP|G03|G0302|810|Mac OS|2
2016-07-03|NTJ95UHFUD3UECNS0U|G05|G0502|3171|Mac OS|4
2016-07-20|ZR27L7C79CCJGTN1F7|G04|G0402|8793|Mac OS|2
2016-07-01|C17QEB0560LUZHD26P|G04|G0402|9793|Android 5.0|5
2016-07-01|N9LRCVTU6PGSUDJ9RB|G03|G0301|1292|Mac OS|1
city.txt
G01|G0101|朝阳
G01|G0102|海淀
G02|G0201|南京
G02|G0202|宿迁
G03|G0301|杭州
G03|G0302|嘉兴
G04|G0401|徐汇
G04|G0402|虹口
G05|G0501|广州
G05|G0502|海珠
region.txt
G01|北京
G02|江苏
G03|浙江
G04|上海
G05|广州
在运行load data命令之前,记得把三个数据文导,docker cp到容器的指定位置。
[root@127 hive]# docker cp fact_data.txt kylin:/opt/
[root@127 hive]# docker cp city.txt kylin:/opt/
[root@127 hive]# docker cp region.txt kylin:/opt/
跑个测试查询
select "DAY",regionname,cityname,sum(pv),count(distinct cookieid)
from WEB_ACCESS_FACT_TBL a
left join CITY_TBL b
on a.cityid = b.cityid
left join REGION_TBL c
on c.regionid = a.regionid
group by "DAY", regionname, cityname;
======================
No Stats for kylin_flat_db@web_access_fact_tbl, Columns: regionid, pv, cookieid, cityid
No Stats for kylin_flat_db@city_tbl, Columns: cityname, cityid
No Stats for kylin_flat_db@region_tbl, Columns: regionid, regionname
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20.log
2022-05-11 00:03:00 Starting to launch local task to process map join; maximum memory = 477626368
2022-05-11 00:03:01 Dump the side-table for tag: 1 with group count: 6 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2022-05-11 00:03:01 Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (424 bytes)
2022-05-11 00:03:01 Dump the side-table for tag: 1 with group count: 11 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2022-05-11 00:03:01 Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (591 bytes)
2022-05-11 00:03:01 End of local task; Time Taken: 1.031 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2022-05-11 00:03:04,496 Stage-3 map = 100%, reduce = 100%
Ended Job = job_local1516279064_0001
MapReduce Jobs Launched:
Stage-Stage-3: HDFS Read: 550 HDFS Write: 1004 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
DAY NULL NULL NULL 0
DAY 浙江 嘉兴 2 1
DAY 浙江 杭州 1 1
DAY 广州 海珠 4 1
DAY 上海 虹口 7 2
Time taken: 11.095 seconds, Fetched: 5 row(s)
三,在kylin的WEB UI里新建项目,导入刚刚生成的Hive数据源。
docker run --name kylin -p 7070:7070 -p 8088:8088 -p 60070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0



四,建立数据模型
模型名称

定义事实表和维度表及关联


字义维度字段和度量数据字段


确认日期格式之后,保证模型(那个filter字段pv的,可写可不写)


完成之后,可以检查一下,走走过场


五,训练CUBE
选择模型,命名CUBE

训练CUBE时的维度信息确认(两种类型哟)

确定CUBE的度量值


后面的设置,保持默认,保持即可。



六,提交训练
确认没问题,就一把梭哈吧。



网友评论