美文网首页数仓实战2(美团架构)
数仓实战03:数仓搭建-ODS 层

数仓实战03:数仓搭建-ODS 层

作者: 勇于自信 | 来源:发表于2020-08-03 16:43 被阅读0次

    1.创建数据库

    1)启动
    hive
    [atguigu@hadoop102 hive]$ nohup bin/hive --service metastore &
    [atguigu@hadoop102 hive]$ nohup bin/hive --service hiveserver2 &
    [atguigu@hadoop102 hive]$ bin/hive
    2)显示数据库
    hive (default)> show databases;
    3)创建数据库
    hive (default)> create database gmall;
    4)使用数据库
    hive (default)> use gmall;

    2.ODS 层(用户行为数据)

    2.1 创建启动日志表 ods_start_log

    1)创建输入数据是 lzo 输出是 text,支持 json 解析的分区表

    hive (gmall)> drop table if exists ods_start_log;
     CREATE EXTERNAL TABLE ods_start_log (`line` string) 
    PARTITIONED BY (`dt` string) 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    LOCATION '/warehouse/gmall/ods/ods_start_log';
    

    说明 Hive 的 LZO 压缩:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
    2)加载数据
    hive (gmall)> load data inpath '/origin_data/gmall/log/topic_start/2020-03-10' into table gmall.ods_start_log partition(dt='2020-03-10');
    3)查看是否加载成功
    hive (gmall)> select * from ods_start_log where dt='2020-03-10' limit 2;
    4)为 lzo 压缩文件创建索引

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar /opt/module/hadoop-
    2.7.2/share/hadoop/common/hadoop-lzo-0.4.20. jar 
    com.hadoop.compression.lzo.DistributedLzoIndexer 
    /warehouse/gmall/ods/ods_start_log/dt=2020-03-10
    
    2.2 创建事件日志表 ods_event_log

    1)创建输入数据是 lzo 输出是 text,支持 json 解析的分区表

    hive (gmall)> drop table if exists ods_event_log;
    CREATE EXTERNAL TABLE ods_event_log(`line` string) 
    PARTITIONED BY (`dt` string) 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    LOCATION '/warehouse/gmall/ods/ods_event_log';
    

    2)加载数据

    hive (gmall)> load data inpath '/origin_data/gmall/log/topic_event/2020-03-10' into table gmall.ods_event_log partition(dt='2020-03-10');
    

    注意:时间格式都配置成 YYYY-MM-DD 格式,这是 Hive 默认支持的时间格式
    3)查看是否加载成功
    hive (gmall)> select * from ods_event_log where dt="2020-03-10" limit 2;
    4)为 lzo 压缩文件创建索引

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar /opt/module/hadoop-
    2.7.2/share/hadoop/common/hadoop-lzo-0.4.20. jar 
    com.hadoop.compression.lzo.DistributedLzoIndexer 
    /warehouse/gmall/ods/ods_event_log/dt=2020-03-10
    
    2.3 Shell 中单引号和双引号区别

    1)在/home/atguigu/bin 创建一个 test.sh 文件
    [atguigu@hadoop102 bin]$ vim test.sh
    在文件中添加如下内容

    #!/bin/bash 
    do_date=$1 
    echo '$do_date' 
    echo "$do_date" 
    echo "'$do_date'" 
    echo '"$do_date"' 
    echo `date`
    

    2)查看执行结果

    [atguigu@hadoop102 bin]$ test.sh 
    2020-03-10 
    $do_date 
    2020-03-10 
    '2020-03-10' 
    "$do_date" 
    2020 年 05 月 02 日 星期四 21:02:08 CST
    

    3)总结:
    (1)单引号不取变量值
    (2)双引号取变量值
    (3)反引号`,执行引号中命令
    (4)双引号内部嵌套单引号,取出变量值
    (5)单引号内部嵌套双引号,不取出变量值

    2.4ODS 层加载数据脚本

    1)在 hadoop102 的/home/atguigu/bin 目录下创建脚本
    [atguigu@hadoop102 bin]$ vim hdfs_to_ods_log.sh
    在脚本中编写如下内容

    #!/bin/bash 
    # 定义变量方便修改 
    APP=gmall 
    hive=/opt/module/hive/bin/hive 
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 
    if [ -n "$1" ] ;then 
         do_date=$1 
    else
         do_date=`date -d "-1 day" +%F` 
    fi
    echo "===日志日期为 $do_date===" 
    sql=" 
    load data inpath '/origin_data/gmall/log/topic_start/$do_date' overwrite into table ${APP}.ods_start_log partition(dt='$do_date'); 
    load data inpath '/origin_data/gmall/log/topic_event/$do_date' overwrite into table ${APP}.ods_event_log partition(dt='$do_date'); "
    $hive -e "$sql" 
    hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar 
    com.hadoop.compression.lzo.DistributedLzoIndexer 
    /warehouse/gmall/ods/ods_start_log/dt=$do_date
    hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar 
    com.hadoop.compression.lzo.DistributedLzoIndexer 
    /warehouse/gmall/ods/ods_event_log/dt=$do_date
    

    说明 1:
    [ -n 变量值 ] 判断变量的值,是否为空
    -- 变量的值,非空,返回 true -- 变量的值,为空,返回 false
    说明 2:
    查看 date 命令的使用,[atguigu@hadoop102 ~]$ date --help
    2)增加脚本执行权限
    [atguigu@hadoop102 bin]$ chmod 777 hdfs_to_ods_log.sh
    3)脚本使用
    [atguigu@hadoop102 module]$ hdfs_to_ods_log.sh 2020-03-11
    4)查看导入数据
    hive (gmall)> select * from ods_start_log where dt='2020-03-11' limit 2;
    select * from ods_event_log where dt='2020-03-11' limit 2;
    5)脚本执行时间
    企业开发中一般在每日凌晨 30 分~1 点

    3.ODS 层(业务数据)

    3.1 订单表(增量及更新)

    hive (gmall)> 
    drop table if exists ods_order_info;
    create external table ods_order_info ( `id` string COMMENT '订单号',
     `final_total_amount` decimal(10,2) COMMENT '订单金额', 
    `order_status` string COMMENT '订单状态', 
    `user_id` string COMMENT '用户 id', 
    `out_trade_no` string COMMENT '支付流水号', 
    `create_time` string COMMENT '创建时间', 
    `operate_time` string COMMENT '操作时间', 
    `province_id` string COMMENT '省份 ID', 
    `benefit_reduce_amount` decimal(10,2) COMMENT '优惠金额', 
    `original_total_amount` decimal(10,2) COMMENT '原价金额', 
    `feight_fee` decimal(10,2) COMMENT '运费' ) 
    COMMENT '订单表' 
    PARTITIONED BY (`dt` string) 
    row format delimited fields terminated by '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_order_info/';
    

    3.2订单详情表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_order_detail;
    
    CREATE external TABLE ods_order_detail (
        `id` string COMMENT '订单编号',
        `order_id` string COMMENT '订单号',
        `user_id` string COMMENT '用户 id',
        `sku_id` string COMMENT '商品 id',
        `sku_name` string COMMENT '商品名称',
        `order_price` DECIMAL (10, 2) COMMENT '商品价格',
        `sku_num` BIGINT COMMENT '商品数量',
        `create_time` string COMMENT '创建时间'
    ) COMMENT '订单详情表' PARTITIONED BY (`dt` string) ROW format delimited FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/warehouse/gmall/ods/ods_order_detail/';
    

    3.3SKU 商品表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_sku_info;
    
    CREATE external TABLE ods_sku_info (
        `id` string COMMENT 'skuId',
        `spu_id` string COMMENT 'spuid',
        `price` DECIMAL (10, 2) COMMENT '价格',
        `sku_name` string COMMENT '商品名称',
        `sku_desc` string COMMENT '商品描述',
        `weight` string COMMENT '重量',
        `tm_id` string COMMENT '品牌 id',
        `category3_id` string COMMENT '品类 id',
        `create_time` string COMMENT '创建时间'
    ) COMMENT 'SKU 商品表' PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_sku_info/';
    

    3.4用户表(增量及更新)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_user_info;
    
    CREATE external TABLE ods_user_info (
        `id` string COMMENT '用户 id',
        `name` string COMMENT '姓名',
        `birthday` string COMMENT '生日',
        `gender` string COMMENT '性别',
        `email` string COMMENT '邮箱',
        `user_level` string COMMENT '用户等级',
        `create_time` string COMMENT '创建时间',
        `operate_time` string COMMENT '操作时间'
    ) COMMENT '用户表' PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_user_info/';
    

    3.5商品一级分类表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_category1;
    
    CREATE external TABLE ods_base_category1 (
        `id` string COMMENT 'id',
        `name` string COMMENT '名称'
    ) COMMENT '商品一级分类表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_category1/';
    

    3.6商品二级分类表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_category2;
    
    CREATE external TABLE ods_base_category2 (
        `id` string COMMENT ' id',
        `name` string COMMENT '名称',
        category1_id string COMMENT '一级品类 id'
    ) COMMENT '商品二级分类表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_category2/';
    

    3.7商品三级分类表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_category3;
    
    CREATE external TABLE ods_base_category3 (
        `id` string COMMENT ' id',
        `name` string COMMENT '名称',
        category2_id string COMMENT '二级品类 id'
    ) COMMENT '商品三级分类表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_category3/';
    

    3.8支付流水表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_payment_info;
    
    CREATE external TABLE ods_payment_info (
        `id` BIGINT COMMENT '编号',
        `out_trade_no` string COMMENT '对外业务编号',
        `order_id` string COMMENT '订单编号',
        `user_id` string COMMENT '用户编号',
        `alipay_trade_no` string COMMENT '支付宝交易流水编号',
        `total_amount` DECIMAL (16, 2) COMMENT '支付金额',
        `subject` string COMMENT '交易内容',
        `payment_type` string COMMENT '支付类型',
        `payment_time` string COMMENT '支付时间'
    ) COMMENT '支付流水表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_payment_info/';
    

    3.9 省份表(特殊)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_province;
    
    CREATE external TABLE ods_base_province (
        `id` BIGINT COMMENT '编号',
        `name` string COMMENT '省份名称',
        `region_id` string COMMENT '地区 ID',
        `area_code` string COMMENT '地区编码',
        `iso_code` string COMMENT 'iso 编码'
    ) COMMENT '省份表' 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_province/';
    

    3.10 地区表(特殊)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_region;
    
    CREATE external TABLE ods_base_region (
        `id` BIGINT COMMENT '编号',
        `region_name` string COMMENT '地区名称'
    ) COMMENT '地区表' 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_region/';
    

    3.11 品牌表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_trademark;
    
    CREATE external TABLE ods_base_trademark (
        `tm_id` BIGINT COMMENT '编号',
        `tm_name` string COMMENT '品牌名称'
    ) COMMENT '品牌表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_trademark/';
    

    3.12 订单状态表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_order_status_log;
    
    CREATE external TABLE ods_order_status_log (
        `id` BIGINT COMMENT '编号',
        `order_id` string COMMENT '订单 ID',
        `order_status` string COMMENT '订单状态',
        `operate_time` string COMMENT '修改时间'
    ) COMMENT '订单状态表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_order_status_log/';
    

    3.13 SPU 商品表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_spu_info;
    
    CREATE external TABLE ods_spu_info (
        `id` string COMMENT 'spuid',
        `spu_name` string COMMENT 'spu 名称',
        `category3_id` string COMMENT '品类 id',
        `tm_id` string COMMENT '品牌 id'
    ) COMMENT 'SPU 商品表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_spu_info/';
    

    3.14 商品评论表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_comment_info;
    
    CREATE external TABLE ods_comment_info (
        `id` string COMMENT '编号',
        `user_id` string COMMENT '用户 ID',
        `sku_id` string COMMENT '商品 sku',
        `spu_id` string COMMENT '商品 spu',
        `order_id` string COMMENT '订单 ID',
        `appraise` string COMMENT '评价',
        `create_time` string COMMENT '评价时间'
    ) COMMENT '商品评论表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_comment_info/';
    

    3.15 退单表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_order_refund_info;
    
    CREATE external TABLE ods_order_refund_info (
        `id` string COMMENT '编号',
        `user_id` string COMMENT '用户 ID',
        `order_id` string COMMENT '订单 ID',
        `sku_id` string COMMENT '商品 ID',
        `refund_type` string COMMENT '退款类型',
        `refund_num` BIGINT COMMENT '退款件数',
        `refund_amount` DECIMAL (16, 2) COMMENT '退款金额',
        `refund_reason_type` string COMMENT '退款原因类型',
        `create_time` string COMMENT '退款时间'
    ) COMMENT '退单表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_order_refund_info/';
    

    3.16 加购表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_cart_info;
    
    CREATE external TABLE ods_cart_info (
        `id` string COMMENT '编号',
        `user_id` string COMMENT '用户 id',
        `sku_id` string COMMENT 'skuid',
        `cart_price` string COMMENT '放入购物车时价格',
        `sku_num` string COMMENT '数量',
        `sku_name` string COMMENT 'sku 名称 (冗余)',
        `create_time` string COMMENT '创建时间',
        `operate_time` string COMMENT '修改时间',
        `is_ordered` string COMMENT '是否已经下单',
        `order_time` string COMMENT '下单时间'
    ) COMMENT '加购表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_cart_info/';
    

    3.17 商品收藏表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_favor_info;
    
    CREATE external TABLE ods_favor_info (
        `id` string COMMENT '编号',
        `user_id` string COMMENT '用户 id',
        `sku_id` string COMMENT 'skuid',
        `spu_id` string COMMENT 'spuid',
        `is_cancel` string COMMENT '是否取消',
        `create_time` string COMMENT '收藏时间',
        `cancel_time` string COMMENT '取消时间'
    ) COMMENT '商品收藏表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_favor_info/';
    

    3.18 优惠券领用表(新增及变化)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_coupon_use;
    
    CREATE external TABLE ods_coupon_use (
        `id` string COMMENT '编号',
        `coupon_id` string COMMENT '优惠券 ID',
        `user_id` string COMMENT 'skuid',
        `order_id` string COMMENT 'spuid',
        `coupon_status` string COMMENT '优惠券状态',
        `get_time` string COMMENT '领取时间',
        `using_time` string COMMENT '使用时间(下单)',
        `used_time` string COMMENT '使用时间(支付)'
    ) COMMENT '优惠券领用表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_coupon_use/';
    

    3.19 优惠券表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_coupon_info;
    
    CREATE external TABLE ods_coupon_info (
        `id` string COMMENT '购物券编号',
        `coupon_name` string COMMENT '购物券名称',
        `coupon_type` string COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',
        `condition_amount` string COMMENT '满额数',
        `condition_num` string COMMENT '满件数',
        `activity_id` string COMMENT '活动编号',
        `benefit_amount` string COMMENT '减金额',
        `benefit_discount` string COMMENT '折扣',
        `create_time` string COMMENT '创建时间',
        `range_type` string COMMENT '范围类型 1、商品 2、品类 3、品牌',
        `spu_id` string COMMENT '商品 id',
        `tm_id` string COMMENT '品牌 id',
        `category3_id` string COMMENT '品类 id',
        `limit_num` string COMMENT '最多领用次数',
        `operate_time` string COMMENT '修改时间',
        `expire_time` string COMMENT '过期时间'
    ) COMMENT '优惠券表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_coupon_info/';
    

    3.20 活动表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_activity_info;
    
    CREATE external TABLE ods_activity_info (
        `id` string COMMENT '编号',
        `activity_name` string COMMENT '活动名称',
        `activity_type` string COMMENT '活动类型',
        `start_time` string COMMENT '开始时间',
        `end_time` string COMMENT '结束时间',
        `create_time` string COMMENT '创建时间'
    ) COMMENT '活动表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_activity_info/';
    

    3.21 活动订单关联表(增量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_activity_order;
    
    CREATE external TABLE ods_activity_order (
        `id` string COMMENT '编号',
        `activity_id` string COMMENT '优惠券 ID',
        `order_id` string COMMENT 'skuid',
        `create_time` string COMMENT '领取时间'
    ) COMMENT '活动订单关联表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_activity_order/';
    

    3.22 优惠规则表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_activity_rule;
    
    CREATE external TABLE ods_activity_rule (
        `id` string COMMENT '编号',
        `activity_id` string COMMENT '活动 ID',
        `condition_amount` string COMMENT '满减金额',
        `condition_num` string COMMENT '满减件数',
        `benefit_amount` string COMMENT '优惠金额',
        `benefit_discount` string COMMENT '优惠折扣',
        `benefit_level` string COMMENT '优惠级别'
    ) COMMENT '优惠规则表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_activity_rule/';
    

    3.23 编码字典表(全量)

    hive (gmall) > DROP TABLE
    IF EXISTS ods_base_dic;
    
    CREATE external TABLE ods_base_dic (
        `dic_code` string COMMENT '编号',
        `dic_name` string COMMENT '编码名称',
        `parent_code` string COMMENT '父编码',
        `create_time` string COMMENT '创建日期',
        `operate_time` string COMMENT '操作日期'
    ) COMMENT '编码字典表' 
    PARTITIONED BY (`dt` string) 
    ROW format delimited FIELDS TERMINATED BY '\t' 
    STORED AS 
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
    location '/warehouse/gmall/ods/ods_base_dic/';
    

    3.24 ODS 层加载数据脚本
    1)在/home/atguigu/bin 目录下创建脚本 hdfs_to_ods_db.sh
    [atguigu@hadoop102 bin]$ vim hdfs_to_ods_db.sh
    在脚本中填写如下内容

    #!/bin/bash 
    APP=gmall 
    hive=/opt/module/hive/bin/hive 
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 
    if [ -n "$2" ] ;
      then 
    do_date=$2 
    elsedo_date=`date -d "-1 day" +%F` 
    fisql1=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date');
    load data inpath '/origin_data/$APP/db/activity_order/$do_date' OVERWRITE into table ${APP}.ods_activity_order partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date'); 
    load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "
    sql2=" load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province; 
    load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region; 
    "case $1 in 
    "first")
    { $hive -e "$sql1" 
    $hive -e "$sql2" 
    };; 
    "all"){ 
        $hive -e "$sql1" 
    };; 
    esac
    

    2)修改权限

    [atguigu@hadoop102 bin]$ chmod 777 hdfs_to_ods_db.sh
    

    3)初次导入

    [atguigu@hadoop102 bin]$ hdfs_to_ods_db.sh first 2020-03-10
    

    4)每日导入

    [atguigu@hadoop102 bin]$ hdfs_to_ods_db.sh all 2020-03-11
    

    5)测试数据是否导入成功

    hive (gmall)> select * from ods_order_detail where dt='2020-03-11';
    

    相关文章

      网友评论

        本文标题:数仓实战03:数仓搭建-ODS 层

        本文链接:https://www.haomeiwen.com/subject/nbgvrktx.html