美文网首页杂论
【Hive】开窗函数over小结

【Hive】开窗函数over小结

作者: 7ccc099f4608 | 来源:发表于2018-06-07 23:12 被阅读6次

    较之于group by,开窗函数over的好处在于:over返回的是group by 之后再join的结果。也就是说,over返回的大小和原表格应该是一致的,且能够获取到除了group by之外的column。

    create table temp.orders_temp(name VARCHAR(20), cate int, s int, d int);
    
    insert into orders_temp values
    ("a", 1, 1, 13),
    ("b", 2, 2, 112),
    ("c", 3, 3, 12),
    ("d", 2, 4, 1),
    ("e", 1, 5, 3),
    ("f", 3, 6, 4),
    ("g", 2, 7, 76),
    ("h", 3, 8, 5),
    ("i", 2, 9, 1);
    
    select 
    name as nm, 
    cate as ct,
    s,
    d,
    max(s) over() as a,
    max(s) over(partition by cate) b,
    max(d) over(partition by name) c,
    sum(s) over(partition by cate) d,
    sum(d) over(partition by name) e,
    sum(s) over(partition by cate order by s) f,
    sum(s) over(partition by cate order by s rows between unbounded preceding and unbounded following) g,
    sum(s) over(partition by cate order by s rows between unbounded preceding and current row) h,
    sum(s) over(partition by cate order by d) i,
    sum(s) over(partition by cate order by d rows between unbounded preceding and unbounded following) j,
    sum(s) over(partition by cate order by d rows between unbounded preceding and current row) k,
    rank() over(partition by cate order by s) l,
    row_number() over() m,
    row_number() over(partition by cate order by s) n,
    row_number() over(partition by cate order by d) o
    from temp.orders_temp;
    
    nm  ct  s   d   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
    a   1   1   13  9   5   13  6   13  1   6   1   6   6   6   1   1   1   2
    e   1   5   3   9   5   3   6   3   6   6   6   5   6   5   2   5   2   1
    b   2   2   112 9   9   112 22  112 2   22  2   22  22  22  1   2   1   4
    d   2   4   1   9   9   1   22  1   6   22  6   13  22  4   2   4   2   1
    g   2   7   76  9   9   76  22  76  13  22  13  20  22  20  3   7   3   3
    i   2   9   1   9   9   1   22  1   22  22  22  13  22  13  4   9   4   2
    c   3   3   12  9   8   12  17  12  3   17  3   17  17  17  1   3   1   3
    f   3   6   4   9   8   4   17  4   9   17  9   6   17  6   2   6   2   1
    h   3   8   5   9   8   5   17  5   17  17  17  14  17  14  3   8   3   2
    

    over总结:

    1. over() 是对全局进行操作;
    2. over(partition by oo) 是将数据集按oo的值不同切分成若干组,分别对每个组整组(从头至尾)进行操作,相当于省略了rows between unbounded preceding and unbounded following
    3. over(partition by oo order by xx)oo切分好的组,按照xx排序后,对当前组当前“值”(而不是“行” (current row),对比ik):当分组里的xx列都是不重复的值,此时,相当于省略了rows between unbounded preceding and current row,如hf;当分组里的xx列出现重复的值,相同的值 是最小的操作单位,而rows between unbounded preceding and current row严格按照当前行进行操作。

    对比overorder byrows between unbounded preceding and current row

    1. order by是 按值操作,如果省略rows,则是从该组第一行到当前“值”(如果有重复,则是到相同值的最后一行,比如i);
    2. rows 是 按行操作。

    数据切片

    fi
    nm  ct  s   d   f   i
    a   1   1   13  1   6
    e   1   5   3   6   5
    b   2   2   112 2   22
    d   2   4   1   6   13
    g   2   7   76  13  20
    i   2   9   1   22  13
    c   3   3   12  3   17
    f   3   6   4   9   6
    h   3   8   5   17  14
    
    

    ik

    nm  ct  s   d   i   k
    a   1   1   13  6   6
    e   1   5   3   5   5
    b   2   2   112 22  22
    d   2   4   1   13  4
    g   2   7   76  20  20
    i   2   9   1   13  13
    c   3   3   12  17  17
    f   3   6   4   6   6
    h   3   8   5   14  14
    

    d按升序排序后的i

    ct  s   d   i
    1   5   3   5
    1   1   13  6
    2   4   1   13
    2   9   1   13
    2   7   76  20
    2   2   112 22
    3   6   4   6
    3   8   5   14
    3   3   12  17
    

    d按升序排序后的k

    ct  s   d   k
    1   5   3   5
    1   1   13  6
    2   4   1   4
    2   9   1   13
    2   7   76  20
    2   2   112 22
    3   6   4   6
    3   8   5   14
    3   3   12  17
    

    相关文章

      网友评论

        本文标题:【Hive】开窗函数over小结

        本文链接:https://www.haomeiwen.com/subject/admnsftx.html