美文网首页
kylin-learn_kylin表结构解读

kylin-learn_kylin表结构解读

作者: 李小李的路 | 来源:发表于2020-02-19 15:12 被阅读0次
    • 基于kylin-3.0

    背景

    • 学习kylin,当搭建好kylin平台后,无论是单节点的还是集群模式的,接下来就是需要进行kylin官方提供的样例cube进行学习;
    • 脚本位置:${KYLIN_HOME}/bin/sample.sh,直接执行即可;
    • 执行结束后,默认是在default库下,会生成5张表;暂命名为销售模型,具体如下:
    • 一定要耐着性子、仔细的理解一下这几张表,对后面的model、cube的创建的理解有很大的帮助;
    # 用户账户表
    kylin_account
    # 日期维度
    kylin_cal_dt
    # 商品类别表
    kylin_category_groupings
    # 地理位置国家表
    kylin_country
    # Sales order table, fact table
    kylin_sales
    

    数仓模型

    • 典型的星形模型(start model)。其中,事实表为:kylin_sales(此表为非分区表,这一点需要注意);维度表(或查找表Lookup table):kylin_account , kylin_cal_dtkylin_category_groupingskylin_country
    • 下面具体学习一下各个表的结构;

    事实表 kylin_sales

    • 表结构如下:
    CREATE TABLE `kylin_sales`(
      `trans_id` bigint, 
      `part_dt` date COMMENT 'Order Date', 
      `lstg_format_name` string COMMENT 'Order Transaction Type', 
      `leaf_categ_id` bigint COMMENT 'Category ID', 
      `lstg_site_id` int COMMENT 'Site ID', 
      `slr_segment_cd` smallint, 
      `price` decimal(19,4) COMMENT 'Order Price', 
      `item_count` bigint COMMENT 'Number of Purchased Goods', 
      `seller_id` bigint COMMENT 'Seller ID', 
      `buyer_id` bigint COMMENT 'Buyer ID', 
      `ops_user_id` string COMMENT 'System User ID', 
      `ops_region` string COMMENT 'System User Region')
    COMMENT 'Sales order table, fact table'
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'=',', 
      'serialization.format'=',') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://bigdata/user/hive/warehouse/kylin_sales'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1580892437')
    
    • 在cube的设计中,采用字段part_dt作为时间字段;
    • 个人觉得事实表应该设计成以dt为分区字段的分区表;

    维度表 kylin_account

    • 表结构如下:
    CREATE TABLE `kylin_account`(
      `account_id` bigint, 
      `account_buyer_level` int COMMENT 'Account Buyer Level', 
      `account_seller_level` int COMMENT 'Account Seller Level', 
      `account_country` string COMMENT 'Account Country', 
      `account_contact` string COMMENT 'Account Contact Info')
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'=',', 
      'serialization.format'=',') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://bigdata/user/hive/warehouse/kylin_account'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1580892437')
    
    • 维度表为非分区表;
    • KYLIN_SALES.BUYER_ID = BUYER_ACCOUNT.ACCOUNT_ID
    • KYLIN_SALES.SELLER_ID = SELLER_ACCOUNT.ACCOUNT_ID

    维度表 kylin_cal_dt

    • 表结构如下:
    CREATE TABLE `kylin_cal_dt`(
      `cal_dt` date COMMENT 'Date, PK', 
      `year_beg_dt` date COMMENT 'YEAR Begin Date', 
      `qtr_beg_dt` date COMMENT 'Quarter Begin Date', 
      `month_beg_dt` date COMMENT 'Month Begin Date', 
      `week_beg_dt` date COMMENT 'Week Begin Date', 
      `age_for_year_id` smallint, 
      `age_for_qtr_id` smallint, 
      `age_for_month_id` smallint, 
      `age_for_week_id` smallint, 
      `age_for_dt_id` smallint, 
      `age_for_rtl_year_id` smallint, 
      `age_for_rtl_qtr_id` smallint, 
      `age_for_rtl_month_id` smallint, 
      `age_for_rtl_week_id` smallint, 
      `age_for_cs_week_id` smallint, 
      `day_of_cal_id` int, 
      `day_of_year_id` smallint, 
      `day_of_qtr_id` smallint, 
      `day_of_month_id` smallint, 
      `day_of_week_id` int, 
      `week_of_year_id` tinyint, 
      `week_of_cal_id` int, 
      `month_of_qtr_id` tinyint, 
      `month_of_year_id` tinyint, 
      `month_of_cal_id` smallint, 
      `qtr_of_year_id` tinyint, 
      `qtr_of_cal_id` smallint, 
      `year_of_cal_id` smallint, 
      `year_end_dt` string, 
      `qtr_end_dt` string, 
      `month_end_dt` string, 
      `week_end_dt` string, 
      `cal_dt_name` string, 
      `cal_dt_desc` string, 
      `cal_dt_short_name` string, 
      `ytd_yn_id` tinyint, 
      `qtd_yn_id` tinyint, 
      `mtd_yn_id` tinyint, 
      `wtd_yn_id` tinyint, 
      `season_beg_dt` string, 
      `day_in_year_count` smallint, 
      `day_in_qtr_count` tinyint, 
      `day_in_month_count` tinyint, 
      `day_in_week_count` tinyint, 
      `rtl_year_beg_dt` string, 
      `rtl_qtr_beg_dt` string, 
      `rtl_month_beg_dt` string, 
      `rtl_week_beg_dt` string, 
      `cs_week_beg_dt` string, 
      `cal_date` string, 
      `day_of_week` string, 
      `month_id` string, 
      `prd_desc` string, 
      `prd_flag` string, 
      `prd_id` string, 
      `prd_ind` string, 
      `qtr_desc` string, 
      `qtr_id` string, 
      `qtr_ind` string, 
      `retail_week` string, 
      `retail_year` string, 
      `retail_start_date` string, 
      `retail_wk_end_date` string, 
      `week_ind` string, 
      `week_num_desc` string, 
      `week_beg_date` string, 
      `week_end_date` string, 
      `week_in_year_id` string, 
      `week_id` string, 
      `week_beg_end_desc_mdy` string, 
      `week_beg_end_desc_md` string, 
      `year_id` string, 
      `year_ind` string, 
      `cal_dt_mns_1year_dt` string, 
      `cal_dt_mns_2year_dt` string, 
      `cal_dt_mns_1qtr_dt` string, 
      `cal_dt_mns_2qtr_dt` string, 
      `cal_dt_mns_1month_dt` string, 
      `cal_dt_mns_2month_dt` string, 
      `cal_dt_mns_1week_dt` string, 
      `cal_dt_mns_2week_dt` string, 
      `curr_cal_dt_mns_1year_yn_id` tinyint, 
      `curr_cal_dt_mns_2year_yn_id` tinyint, 
      `curr_cal_dt_mns_1qtr_yn_id` tinyint, 
      `curr_cal_dt_mns_2qtr_yn_id` tinyint, 
      `curr_cal_dt_mns_1month_yn_id` tinyint, 
      `curr_cal_dt_mns_2month_yn_id` tinyint, 
      `curr_cal_dt_mns_1week_yn_ind` tinyint, 
      `curr_cal_dt_mns_2week_yn_ind` tinyint, 
      `rtl_month_of_rtl_year_id` string, 
      `rtl_qtr_of_rtl_year_id` tinyint, 
      `rtl_week_of_rtl_year_id` tinyint, 
      `season_of_year_id` tinyint, 
      `ytm_yn_id` tinyint, 
      `ytq_yn_id` tinyint, 
      `ytw_yn_id` tinyint, 
      `kylin_cal_dt_cre_date` string, 
      `kylin_cal_dt_cre_user` string, 
      `kylin_cal_dt_upd_date` string, 
      `kylin_cal_dt_upd_user` string)
    COMMENT 'Date Dimension Table'
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'=',', 
      'serialization.format'=',') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://bigdata/user/hive/warehouse/kylin_cal_dt'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1580892438')
    
    • 这个日期维度的表结构,可以适配绝大多数的日期维度;
    • KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT

    维度表 kylin_category_groupings

    • 表结构
    CREATE TABLE `kylin_category_groupings`(
      `leaf_categ_id` bigint COMMENT 'Category ID, PK', 
      `leaf_categ_name` string, 
      `site_id` int COMMENT 'Site ID, PK', 
      `categ_busn_mgr` string, 
      `categ_busn_unit` string, 
      `regn_categ` string, 
      `user_defined_field1` string COMMENT 'User Defined Field1', 
      `user_defined_field3` string COMMENT 'User Defined Field3', 
      `kylin_groupings_cre_date` string, 
      `kylin_groupings_upd_date` string COMMENT 'Last Updated Date', 
      `kylin_groupings_cre_user` string, 
      `kylin_groupings_upd_user` string COMMENT 'Last Updated User', 
      `meta_categ_id` decimal(10,0), 
      `meta_categ_name` string COMMENT 'Level1 Category', 
      `categ_lvl2_id` decimal(10,0), 
      `categ_lvl3_id` decimal(10,0), 
      `categ_lvl4_id` decimal(10,0), 
      `categ_lvl5_id` decimal(10,0), 
      `categ_lvl6_id` decimal(10,0), 
      `categ_lvl7_id` decimal(10,0), 
      `categ_lvl2_name` string COMMENT 'Level2 Category', 
      `categ_lvl3_name` string COMMENT 'Level3 Category', 
      `categ_lvl4_name` string, 
      `categ_lvl5_name` string, 
      `categ_lvl6_name` string, 
      `categ_lvl7_name` string, 
      `categ_flags` decimal(10,0), 
      `adult_categ_yn` string, 
      `domain_id` decimal(10,0), 
      `user_defined_field5` string, 
      `vcs_id` decimal(10,0), 
      `gcs_id` decimal(10,0), 
      `move_to` decimal(10,0), 
      `sap_category_id` decimal(10,0), 
      `src_id` tinyint, 
      `bsns_vrtcl_name` string)
    COMMENT 'Detail category inforamtion, Dimension Table'
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'=',', 
      'serialization.format'=',') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://bigdata/user/hive/warehouse/kylin_category_groupings'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1580892439')
    
    • 表结构 也不短;
    • KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID
    • KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID

    维度表 kylin_country

    • 表结构
    CREATE TABLE `kylin_country`(
      `country` string, 
      `latitude` double, 
      `longitude` double, 
      `name` string)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'=',', 
      'serialization.format'=',') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://bigdata/user/hive/warehouse/kylin_country'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1580892438')
    
    • 源码中设计这个国家级别,有点牵强。
    • 在实际中,我们需要考虑到地区、城市、城市分区等信息;
    • BUYER_ACCOUNT.ACCOUNT_COUNTRY = BUYER_COUNTRY.COUNTRY
    • SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY

    模型关系图

    • 在官网给定的learn_kylin中,model:kylin_sales_model中表结构关系如下:


      kylin_sales_model

    相关文章

      网友评论

          本文标题:kylin-learn_kylin表结构解读

          本文链接:https://www.haomeiwen.com/subject/litafhtx.html