美文网首页
从头到尾,建一个kylin多维分析Cube

从头到尾,建一个kylin多维分析Cube

作者: 万州客 | 来源:发表于2022-05-12 07:51 被阅读0次

    实践到第三次了,作一下比较完整的记录。

    一,启动kylin测试的docker,保证Hive命令能正常执行

     docker run -d --name kylin -p 7070:7070 -p 8088:8088 -p 50070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0
    

    kylin的这个标准版镜像里,带了hive的,方便些。

    docker exec -it kylin bash
    [root@ab0f5be3a766 admin]# hive
    ls: cannot access /home/admin/spark-2.3.1-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory
    
    Logging initialized using configuration in jar:file:/home/admin/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
    hive> 
    

    二,建好hive的表格,并导入测试数据

    相关命令和导入数据如下

    #建库建表
    create database kylin_flat_db;
    
    
    create table kylin_flat_db.web_access_fact_tbl
         (
             day           date,
             cookieid      string,
             regionid      string,
             cityid        string,
             siteid        string,
             os            string,
             pv            bigint
         ) row format delimited
         fields terminated by '|' stored as textfile;
         
    create table kylin_flat_db.region_tbl
         (
         regionid       string,
         regionname     string
         ) row format delimited
         fields terminated by '|' stored as textfile;
         
    create table kylin_flat_db.city_tbl
         (
         regionid     string,
         cityid       string,
         cityname     string
         ) row format delimited
         fields terminated by '|' stored as textfile;
    

    导入数据

    use kylin_flat_db;
    
    load data local inpath '/opt/fact_data.txt' into table web_access_fact_tbl;
    
    load data local inpath '/opt/city.txt' into table city_tbl;
    
    load data local inpath '/opt/region.txt' into table region_tbl;
    

    其中fact_data.txt

    2016-07-19|GBSYO1IMQ7GHQXOVTP|G03|G0302|810|Mac OS|2
    2016-07-03|NTJ95UHFUD3UECNS0U|G05|G0502|3171|Mac OS|4
    2016-07-20|ZR27L7C79CCJGTN1F7|G04|G0402|8793|Mac OS|2
    2016-07-01|C17QEB0560LUZHD26P|G04|G0402|9793|Android 5.0|5
    2016-07-01|N9LRCVTU6PGSUDJ9RB|G03|G0301|1292|Mac OS|1
    

    city.txt

    G01|G0101|朝阳
    G01|G0102|海淀
    G02|G0201|南京
    G02|G0202|宿迁
    G03|G0301|杭州
    G03|G0302|嘉兴
    G04|G0401|徐汇
    G04|G0402|虹口
    G05|G0501|广州
    G05|G0502|海珠
    

    region.txt

    G01|北京
    G02|江苏
    G03|浙江
    G04|上海
    G05|广州
    

    在运行load data命令之前,记得把三个数据文导,docker cp到容器的指定位置。

    [root@127 hive]# docker cp fact_data.txt kylin:/opt/
    [root@127 hive]# docker cp city.txt kylin:/opt/
    [root@127 hive]# docker cp region.txt kylin:/opt/
    

    跑个测试查询

    select "DAY",regionname,cityname,sum(pv),count(distinct cookieid) 
          from WEB_ACCESS_FACT_TBL a
          left join CITY_TBL b
          on a.cityid = b.cityid
          left join REGION_TBL c
          on c.regionid = a.regionid
          group by "DAY", regionname, cityname;
    ======================
    No Stats for kylin_flat_db@web_access_fact_tbl, Columns: regionid, pv, cookieid, cityid
    No Stats for kylin_flat_db@city_tbl, Columns: cityname, cityid
    No Stats for kylin_flat_db@region_tbl, Columns: regionid, regionname
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20
    Total jobs = 1
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Execution log at: /tmp/root/root_20220511000253_47dcbc5e-c3d9-4ea2-b1a2-85f0ff42ba20.log
    2022-05-11 00:03:00     Starting to launch local task to process map join;      maximum memory = 477626368
    2022-05-11 00:03:01     Dump the side-table for tag: 1 with group count: 6 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
    2022-05-11 00:03:01     Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (424 bytes)
    2022-05-11 00:03:01     Dump the side-table for tag: 1 with group count: 11 into file: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
    2022-05-11 00:03:01     Uploaded 1 File to: file:/tmp/root/c5ea48ae-04d4-49fb-a33a-6e4903c94ce0/hive_2022-05-11_00-02-53_412_3500028136458508988-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (591 bytes)
    2022-05-11 00:03:01     End of local task; Time Taken: 1.031 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Job running in-process (local Hadoop)
    2022-05-11 00:03:04,496 Stage-3 map = 100%,  reduce = 100%
    Ended Job = job_local1516279064_0001
    MapReduce Jobs Launched: 
    Stage-Stage-3:  HDFS Read: 550 HDFS Write: 1004 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec
    OK
    DAY     NULL    NULL    NULL    0
    DAY     浙江    嘉兴    2       1
    DAY     浙江    杭州    1       1
    DAY     广州    海珠    4       1
    DAY     上海    虹口    7       2
    Time taken: 11.095 seconds, Fetched: 5 row(s)
    

    三,在kylin的WEB UI里新建项目,导入刚刚生成的Hive数据源。

     docker run --name kylin -p 7070:7070 -p 8088:8088 -p 60070:50070 -p 8032:8032 -p 8042:8042 -p 16010:16010 apachekylin/apache-kylin-standalone:3.1.0
    
    2022-05-11 08_05_28-悬浮球.png
    2022-05-11 08_30_24-MessageCenterUI.png
    2022-05-11 08_30_59-MessageCenterUI.png

    四,建立数据模型

    模型名称


    2022-05-11 08_33_02-MessageCenterUI.png

    定义事实表和维度表及关联


    2022-05-11 08_41_11-MessageCenterUI.png
    2022-05-11 08_41_24-MessageCenterUI.png
    字义维度字段和度量数据字段
    2022-05-11 08_43_20-MessageCenterUI.png
    2022-05-11 08_44_12-MessageCenterUI.png

    确认日期格式之后,保证模型(那个filter字段pv的,可写可不写)


    2022-05-11 08_48_47-MessageCenterUI.png
    2022-05-11 09_02_20-Kylin.png
    完成之后,可以检查一下,走走过场
    2022-05-11 09_04_39-MessageCenterUI.png 2022-05-11 09_04_21-MessageCenterUI.png

    五,训练CUBE

    选择模型,命名CUBE


    2022-05-11 09_06_11-MessageCenterUI.png

    训练CUBE时的维度信息确认(两种类型哟)


    2022-05-11 09_08_38-MessageCenterUI.png
    确定CUBE的度量值
    2022-05-11 09_12_55-MessageCenterUI.png 2022-05-11 09_12_05-MessageCenterUI.png

    后面的设置,保持默认,保持即可。


    2022-05-11 09_16_41-MessageCenterUI.png 2022-05-11 09_16_14-MessageCenterUI.png 2022-05-11 09_14_19-MessageCenterUI.png

    六,提交训练

    确认没问题,就一把梭哈吧。


    2022-05-11 09_18_20-MessageCenterUI.png
    2022-05-11 09_21_33-MessageCenterUI.png
    2022-05-11 09_22_20-MessageCenterUI.png

    七,测试查询的速度

    相关文章

      网友评论

          本文标题:从头到尾,建一个kylin多维分析Cube

          本文链接:https://www.haomeiwen.com/subject/hafdurtx.html