1. 现象
任务运行非常慢,reduce 长时间卡在 99%。
查看日志:
2020-07-20 09:36:39,239 Stage-5 map = 100%, reduce = 97%, Cumulative CPU 103721.19 sec
2020-07-20 09:36:40,271 Stage-5 map = 100%, reduce = 98%, Cumulative CPU 104116.03 sec
2020-07-20 09:36:42,349 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 104573.49 sec
2020-07-20 09:37:43,022 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 104967.8 sec
2020-07-20 09:38:43,670 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105027.79 sec
2020-07-20 09:39:44,256 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105087.05 sec
2020-07-20 09:40:44,806 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105139.86 sec
2020-07-20 09:41:45,462 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105195.58 sec
2020-07-20 09:42:46,209 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105253.92 sec
2020-07-20 09:43:46,823 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105312.29 sec
2020-07-20 09:44:47,444 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105371.79 sec
2020-07-20 09:45:47,936 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105431.59 sec
2020-07-20 09:46:48,522 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105492.56 sec
2020-07-20 09:47:49,053 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105547.14 sec
2020-07-20 09:48:49,467 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105603.68 sec
2020-07-20 09:49:49,974 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105664.08 sec
2020-07-20 09:50:50,433 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105724.26 sec
2020-07-20 09:51:50,963 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105782.01 sec
2020-07-20 09:52:51,493 Stage-5 map = 100%, reduce = 99%, Cumulative CPU 105841.64 sec
......
2. 排查
-
查看 YARN 上运行状况,发现数据倾斜,大量数据集中在一个 key 上。
-
运用临时表拆分任务,分别运行子任务以快速定位出现问题的位置。
-
优化代码。
set mapred.job.name=stat_indicators_detail;
set mapreduce.input.fileinputformat.split.maxsize=400000000;
set mapreduce.input.fileinputformat.split.minsize=300000000;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=1000000;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000000;
set hive.exec.reducers.bytes.per.reducer=50000000;
set hive.exec.reducers.max=3000;
drop table if exists test.stat_indicators_detail;
create table test.stat_indicators_detail as
select
tt.aa
,tt.bb
,nvl(sal.cc,'-') as cc
from
table1 tt
left join
table2 sal
on tt.aa = sal.aa
由于 table1 中 aa 字段有大量数据为 "-",导致数据倾斜。
优化如下:
set mapred.job.name=stat_indicators_detail;
set mapreduce.input.fileinputformat.split.maxsize=400000000;
set mapreduce.input.fileinputformat.split.minsize=300000000;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=1000000;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000000;
set hive.exec.reducers.bytes.per.reducer=50000000;
set hive.exec.reducers.max=3000;
drop table if exists test.stat_indicators_detail;
create table test.stat_indicators_detail as
select
tt.aa
,tt.bb
,nvl(sal.cc,'-') as cc
from
table1 tt
left join
table2 sal
on case when tt.aa='-' then concat('-',rand()) else tt.aa end = sal.aa
判断 tt.aa 是否为 "-",如果为 "-" 则加随机数打散。
打散后重新 join。
网友评论