美文网首页
Hive分组TopN

Hive分组TopN

作者: 喵星人ZC | 来源:发表于2020-08-05 14:47 被阅读0次

    People表明细如下:


    image.png

    需求:按照性别分组,求分组后年龄最大的两个年龄的人员信息

    SELECT * FROM
    (select id,name,age,sex,
    row_number()  OVER(partition by sex order by age desc) r
     FROM safety.dw_hive_people_df) t
     WHERE r<=2
    
    image.png

    学生技能表如下


    image.png

    需求:按照技能分组,求分组后年龄最大的两个学生信息

    先将skills炸开

    SELECT id,name,age,skill,
    row_number() over(partition by skill order by age desc ) r
     FROM safety.dw_hive_stu_skils_df 
    LATERAL VIEW explode(split(skills,'-')) tb_view as skill
    
    image.png

    如果是SparkSQL可以直接写成

    SELECT id,name,age,explode(split(skills,'-')) as skill
     FROM safety.dw_hive_stu_skils_df 
    

    最终统计SQL为:

    SELECT id,name,age,skill FROM 
    (
    SELECT id,name,age,skill,
    row_number() over(partition by skill order by age desc ) r
     FROM safety.dw_hive_stu_skils_df 
    LATERAL VIEW explode(split(skills,'-')) tb_view as skill
    ) t
    where t.r<=2
    
    image.png

    相关文章

      网友评论

          本文标题:Hive分组TopN

          本文链接:https://www.haomeiwen.com/subject/aqpnrktx.html