美文网首页
数据倾斜排查

数据倾斜排查

作者: Jorvi | 来源:发表于2020-07-24 16:00 被阅读0次

    1. 现象

    任务运行非常慢,reduce 长时间卡在 99%。

    查看日志:

    2020-07-20 09:36:39,239 Stage-5 map = 100%,  reduce = 97%, Cumulative CPU 103721.19 sec
    2020-07-20 09:36:40,271 Stage-5 map = 100%,  reduce = 98%, Cumulative CPU 104116.03 sec
    2020-07-20 09:36:42,349 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 104573.49 sec
    2020-07-20 09:37:43,022 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 104967.8 sec
    2020-07-20 09:38:43,670 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105027.79 sec
    2020-07-20 09:39:44,256 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105087.05 sec
    2020-07-20 09:40:44,806 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105139.86 sec
    2020-07-20 09:41:45,462 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105195.58 sec
    2020-07-20 09:42:46,209 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105253.92 sec
    2020-07-20 09:43:46,823 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105312.29 sec
    2020-07-20 09:44:47,444 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105371.79 sec
    2020-07-20 09:45:47,936 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105431.59 sec
    2020-07-20 09:46:48,522 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105492.56 sec
    2020-07-20 09:47:49,053 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105547.14 sec
    2020-07-20 09:48:49,467 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105603.68 sec
    2020-07-20 09:49:49,974 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105664.08 sec
    2020-07-20 09:50:50,433 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105724.26 sec
    2020-07-20 09:51:50,963 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105782.01 sec
    2020-07-20 09:52:51,493 Stage-5 map = 100%,  reduce = 99%, Cumulative CPU 105841.64 sec
    
    ......
    
    

    2. 排查

    1. 查看 YARN 上运行状况,发现数据倾斜,大量数据集中在一个 key 上。

    2. 运用临时表拆分任务,分别运行子任务以快速定位出现问题的位置。

    3. 优化代码。

    set mapred.job.name=stat_indicators_detail;
    set mapreduce.input.fileinputformat.split.maxsize=400000000;
    set mapreduce.input.fileinputformat.split.minsize=300000000;
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    set hive.auto.convert.join=true;
    set hive.mapjoin.smalltable.filesize=1000000;
    set hive.auto.convert.join.noconditionaltask=true;
    set hive.auto.convert.join.noconditionaltask.size=1000000;
    set hive.exec.reducers.bytes.per.reducer=50000000;
    set hive.exec.reducers.max=3000;
    
    
    drop table if exists test.stat_indicators_detail;
    create table test.stat_indicators_detail as 
    select  
        tt.aa
        ,tt.bb
        ,nvl(sal.cc,'-') as cc
    from
        table1 tt
    left join
        table2 sal
    on tt.aa = sal.aa
    

    由于 table1 中 aa 字段有大量数据为 "-",导致数据倾斜。

    优化如下:

    set mapred.job.name=stat_indicators_detail;
    set mapreduce.input.fileinputformat.split.maxsize=400000000;
    set mapreduce.input.fileinputformat.split.minsize=300000000;
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    set hive.auto.convert.join=true;
    set hive.mapjoin.smalltable.filesize=1000000;
    set hive.auto.convert.join.noconditionaltask=true;
    set hive.auto.convert.join.noconditionaltask.size=1000000;
    set hive.exec.reducers.bytes.per.reducer=50000000;
    set hive.exec.reducers.max=3000;
    
    
    drop table if exists test.stat_indicators_detail;
    create table test.stat_indicators_detail as 
    select  
        tt.aa
        ,tt.bb
        ,nvl(sal.cc,'-') as cc
    from
        table1 tt
    left join
        table2 sal
    on case when tt.aa='-' then concat('-',rand()) else tt.aa end = sal.aa
    

    判断 tt.aa 是否为 "-",如果为 "-" 则加随机数打散。

    打散后重新 join。

    相关文章

      网友评论

          本文标题:数据倾斜排查

          本文链接:https://www.haomeiwen.com/subject/fsuwkktx.html