美文网首页
hive级联累计

hive级联累计

作者: pamperxg | 来源:发表于2017-08-11 21:20 被阅读0次

    根据访问次数统计表,得到累计访问总计

    • 建表,load数据
    t_access_times.dat
    A,2015-01,5
    A,2015-01,15
    B,2015-01,5
    A,2015-01,8
    B,2015-01,25
    A,2015-01,5
    A,2015-02,4
    A,2015-02,6
    B,2015-02,10
    B,2015-02,5
    
    create table t_access_times(username string,month string,salary int)
    row format delimited fields terminated by ',';
    
    load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;
    +--------------------------+-----------------------+------------------------+--+
    | t_access_times.username  | t_access_times.month  | t_access_times.salary  |
    +--------------------------+-----------------------+------------------------+--+
    | A                        | 2015-01               | 5                      |
    | A                        | 2015-01               | 15                     |
    | B                        | 2015-01               | 5                      |
    | A                        | 2015-01               | 8                      |
    | B                        | 2015-01               | 25                     |
    | A                        | 2015-01               | 5                      |
    | A                        | 2015-02               | 4                      |
    | A                        | 2015-02               | 6                      |
    | B                        | 2015-02               | 10                     |
    | B                        | 2015-02               | 5                      |
    +--------------------------+-----------------------+------------------------+--
    
    • 求每个用户的月总金额
    select username,month,sum(salary) from t_access_times group by username,month;
    +-----------+----------+------+--+
    | username  |  month   | _c2  |
    +-----------+----------+------+--+
    | A         | 2015-01  | 33   |
    | A         | 2015-02  | 10   |
    | B         | 2015-01  | 30   |
    | B         | 2015-02  | 15   |
    +-----------+----------+------+--+
    
    • 把表自己inner join
    select a.*,b.* from
    (select username,month,sum(salary) as salary from t_access_times group by username,month) A 
    inner join 
    (select username,month,sum(salary) as salary from t_access_times group by username,month) B
    on
    A.username=B.username
    +-------------+----------+-----------+-------------+----------+-----------+--+
    | a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
    +-------------+----------+-----------+-------------+----------+-----------+--+
    | A           | 2015-01  | 33        | A           | 2015-01  | 33        |
    | A           | 2015-01  | 33        | A           | 2015-02  | 10        |
    | A           | 2015-02  | 10        | A           | 2015-01  | 33        |
    | A           | 2015-02  | 10        | A           | 2015-02  | 10        |
    | B           | 2015-01  | 30        | B           | 2015-01  | 30        |
    | B           | 2015-01  | 30        | B           | 2015-02  | 15        |
    | B           | 2015-02  | 15        | B           | 2015-01  | 30        |
    | B           | 2015-02  | 15        | B           | 2015-02  | 15        |
    +-------------+----------+-----------+-------------+----------+-----------+--+
    
    • 生成累计值
    select a.username,a.month,max(a.salary) as salary,sum(b.salary) as accumulate from
    (select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (select username,month,sum(salary) as salary from t_access_times group by username,month) B on a.username=b.username
    where b.month <= a.month
    group by a.username,a.month
    order by a.username,a.month;
    
    +-------------+----------+---------+-------------+--+
    | a.username  | a.month  | salary  | accumulate  |
    +-------------+----------+---------+-------------+--+
    | A           | 2015-01  | 33      | 33          |
    | A           | 2015-02  | 10      | 43          |
    | B           | 2015-01  | 30      | 30          |
    | B           | 2015-02  | 15      | 45          |
    +-------------+----------+---------+-------------+
    

    分组查询求月累计值。
    为什么要max(salary)?
    salary不是分组字段,只能由聚合函数求得,不然不知道选哪个。sum avg max

    最后order by 使全局有序,原始数据无序,最后有可能无序。

    相关文章

      网友评论

          本文标题:hive级联累计

          本文链接:https://www.haomeiwen.com/subject/ytylrxtx.html