美文网首页
Spark Skew 的一种姿势

Spark Skew 的一种姿势

作者: zhangliyun | 来源:发表于2020-10-31 10:21 被阅读0次

    fact_table_optimized = select concat(key,'_', floor(rand(123456)*19)) as salted_key, val from (select * from ${model_tmp_db}.ip_snapshot_filter where dt='')t

    dimension_optimized= select dim_key,val, explode(array(0,1,23,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19)) as salted_key from (select cust_id, from ${model_tmp_db}.ip_increment_filter

    where dt=xxx) t

    fact_table 生成随机后缀(floor(rand(123456)*19)))key_随机后缀

    dimension  table  单独生成随机后缀。扩大N 倍

    将Join key 变成加盐后的Key 这样skew 的key 也不skew 了.

    Problem

    select t.asset_value, t.cnt from (select asset_value,count(asset_value) as cnt from bsl2019prod_tmp.ip_snapshot_filter where dt='2020-10-28' group by asset_value having cnt>1000)t order by t.cnt desc limit 500;

    最大的t.cnt 是最小的30倍· skew 的一塌糊涂

    t.asset_value t.cnt

    xxxxx 31522

    xxxx  1209

    Code

    fact_table_optimized = select concat(t.asset_value,'_', floor(rand(123456)*19)) as salted_key, t.cust_id,t.last_date from (select * from bsl2019prod_tmp.ip_snapshot_filter where dt='2020-10-28')t;

    dimension_optimized= select t.cust_id,t.asset_value as key,t.last_date, explode(array(0,1,23,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19)) as salted_key from (select cust_id, asset_value,last_date  from bsl2019prod_tmp.ip_increment_filter  where dt='2020-10-28') t;

    joinret =select

    a.cust_id as cust_id_origin,

    b.cust_id as cust_id_target,

    split(a.salted_key,'_')[0] as asset_value,

    a.last_date AS last_date_origin,

                    b.last_date AS last_date_target,

                    a.last_date AS link_crtd_ts  from fact_table_optimized a, dimension_optimized b where a.salted_key=concat(b.key,'_',b.salted_key);

    insert joinret into table xxx.joinret location '/tmp/joinret';

    Let's see whether skew or not , 一点也不skew , :) x  N

    y

    相关文章

      网友评论

          本文标题:Spark Skew 的一种姿势

          本文链接:https://www.haomeiwen.com/subject/ydhsvktx.html