美文网首页MySQL学习MySQL经典问题
MYSQL经典SQL之时长统计

MYSQL经典SQL之时长统计

作者: user0650 | 来源:发表于2017-01-06 18:22 被阅读137次

    概述

    假设有一张表,记录了用户的登入、登出信息,我们需要根据这些信息统计用户的在线时长,相应的设置用户等级,就像挂QQ升级

    数据如下:

    mysql> select * from user_sessions;
    +----+----------+-------+----------+----------+
    | id | platform | usr   | start    | end      |
    +----+----------+-------+----------+----------+
    |  1 | web      | user1 | 08:30:00 | 10:30:00 |
    |  2 | web      | user2 | 08:30:00 | 08:45:00 |
    |  3 | web      | user1 | 09:00:00 | 09:30:00 |
    |  4 | web      | user2 | 09:15:00 | 10:30:00 |
    |  5 | web      | user1 | 09:15:00 | 09:30:00 |
    |  6 | web      | user2 | 10:30:00 | 14:30:00 |
    |  7 | web      | user1 | 10:45:00 | 11:30:00 |
    |  8 | web      | user2 | 11:00:00 | 12:30:00 |
    |  9 | app      | user1 | 08:30:00 | 08:45:00 |
    | 10 | app      | user2 | 09:00:00 | 09:30:00 |
    | 11 | app      | user1 | 11:45:00 | 12:00:00 |
    | 12 | app      | user2 | 12:30:00 | 14:00:00 |
    | 13 | app      | user1 | 12:45:00 | 13:30:00 |
    | 14 | app      | user2 | 13:00:00 | 14:00:00 |
    | 15 | app      | user1 | 14:00:00 | 16:30:00 |
    | 16 | app      | user2 | 15:30:00 | 17:00:00 |
    +----+----------+-------+----------+----------+
    

    我们的站点有两个入口:web端和app端

    很明显,这些数据时间上是有重叠的,例如user1在08:30~10:30这个时间段内用三个web端和一个app端登录,但他的时长应该只能算作2个小时

    要正确统计,我们需要三步:

    1. 获取用户的重叠时间段中最早的起始时间
    2. 获取用户的重叠时间段中最晚的结束时间
    3. 前两步的起始时间与结束时间合并,得到综合的时间段,再进行统计

    第一步:

    查询用户名、起始时间
    其中起始时间需要满足的条件是:不包含于其它时间段,即不存在一条记录来满足条件“s>start且s<=end”
    得到的结果可能存在重复,所以用distinct去重
    这样找到的s就是用户重叠时间段的最早时间。如下:

    mysql> select distinct usr,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.start>b.start and a.start<=b.end);
    +-------+----------+
    | usr   | start    |
    +-------+----------+
    | user1 | 08:30:00 |
    | user2 | 08:30:00 |
    | user1 | 10:45:00 |
    | user2 | 09:00:00 |
    | user1 | 11:45:00 |
    | user1 | 12:45:00 |
    | user1 | 14:00:00 |
    | user2 | 15:30:00 |
    +-------+----------+
    

    第二步:

    查询用户名、结束时间
    这步于第一步一样道理,结束时间满足的条件是:不包含于其它时间段,即不存在一条记录满足“e>=start且e<end”
    结果如下:

    mysql> select distinct usr,end from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.end>=b.start and a.end<b.end);
    +-------+----------+
    | usr   | end      |
    +-------+----------+
    | user1 | 10:30:00 |
    | user2 | 08:45:00 |
    | user2 | 14:30:00 |
    | user1 | 11:30:00 |
    | user1 | 12:00:00 |
    | user1 | 13:30:00 |
    | user1 | 16:30:00 |
    | user2 | 17:00:00 |
    +-------+----------+
    

    第三步:

    合并前两步的结果
    为了方便查询和叙述,我们将前两步建立对应的视图:

    mysql> create view v_s as select distinct usr,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.start>b.start and a.start<=b.end);
    
    mysql> create view v_e as select distinct usr,end from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.end>=b.start and a.end<b.end);
    

    查看v_s和v_e两个视图:

    mysql> select * from v_s;
    +-------+----------+
    | usr   | start    |
    +-------+----------+
    | user1 | 08:30:00 |
    | user2 | 08:30:00 |
    | user1 | 10:45:00 |
    | user2 | 09:00:00 |
    | user1 | 11:45:00 |
    | user1 | 12:45:00 |
    | user1 | 14:00:00 |
    | user2 | 15:30:00 |
    +-------+----------+
    
    mysql> select * from v_e;
    +-------+----------+
    | usr   | end      |
    +-------+----------+
    | user1 | 10:30:00 |
    | user2 | 08:45:00 |
    | user2 | 14:30:00 |
    | user1 | 11:30:00 |
    | user1 | 12:00:00 |
    | user1 | 13:30:00 |
    | user1 | 16:30:00 |
    | user2 | 17:00:00 |
    +-------+----------+
    

    可以看到,两个视图总行数是相等的,但并不是按行对应的,所以并不是单纯的将两个结果横向合并!

    方案一:

    对于v_s中的起始时间,我们应该在v_e中找“usr相同且end>start”的最小值
    例如,对于v_s中user1起始时间为08:30:00,我们应该找到v_e中end>08:30:00且usr=user1的记录,然后取min得到结果为10:30:00作为与之对应的结束时间
    SQL及结果如下:

    mysql> select distinct v_s.usr,v_s.start,(select min(end) from v_e where v_e.end>v_s.start and v_e.usr=v_s.usr) as end from v_s,v_e where v_s.usr=v_e.usr;
    +-------+----------+----------+
    | usr   | start    | end      |
    +-------+----------+----------+
    | user1 | 08:30:00 | 10:30:00 |
    | user2 | 08:30:00 | 08:45:00 |
    | user1 | 10:45:00 | 11:30:00 |
    | user2 | 09:00:00 | 14:30:00 |
    | user1 | 11:45:00 | 12:00:00 |
    | user1 | 12:45:00 | 13:30:00 |
    | user1 | 14:00:00 | 16:30:00 |
    | user2 | 15:30:00 | 17:00:00 |
    +-------+----------+----------+
    

    在此基础上,按usr进行分组,统计用户的总时长,得到最终结果:

    mysql> select usr,sec_to_time(sum(timestampdiff(second,start,end))) as time from (select distinct v_s.usr,v_s.start,(select min(end) from v_e where v_e.end>v_s.start and v_e.usr=v_s.usr) as end from v_s,v_e where v_s.usr=v_e.usr) as c group by usr;
    +-------+----------+
    | usr   | time     |
    +-------+----------+
    | user1 | 06:15:00 |
    | user2 | 07:15:00 |
    +-------+----------+
    

    其中,timestampdiff函数是计算两个时间的差值,第一个参数用来指定其单位,这里指定为秒;
    sec_to_time函数是将求和后的秒数再转为时间格式来显示

    方案二

    既然v_s与v_e的数据不是按行对应的,那么我们可以对其排序,使其按行对应后,就可以直接按行合并
    为了按行合并,我们同时加上行号,让两个子查询行号匹配即可:

    mysql> set @sno=0;
    mysql> set @eno=0;
    mysql> select s.usr,s.start,e.end from (select @sno:=@sno+1 as sno,usr,start from v_s order by usr,start) as s, (select @eno:=@eno+1 as eno,usr,end from v_e order by usr,end) as e where s.sno=e.eno;
    +-------+----------+----------+
    | usr   | start    | end      |
    +-------+----------+----------+
    | user1 | 08:30:00 | 10:30:00 |
    | user1 | 10:45:00 | 11:30:00 |
    | user1 | 11:45:00 | 12:00:00 |
    | user1 | 12:45:00 | 13:30:00 |
    | user1 | 14:00:00 | 16:30:00 |
    | user2 | 08:30:00 | 08:45:00 |
    | user2 | 09:00:00 | 14:30:00 |
    | user2 | 15:30:00 | 17:00:00 |
    +-------+----------+----------+
    

    之后的分组统计与方案一相同

    扩展

    实际场景中,我们经常将不同的方式的在线时长加上奖励倍数,来鼓励用户更多的使用某种入口。
    例如,我们为了让用户更多的使用app登录,规定app的在线时长奖励2倍,即app在线一小时相当于web在线两小时。
    这时,我们就需要将不同的登录方式区分开来统计。要区分开也很简单,只需要添加SQL条件让platform相同即可,例如第一步修改为:

    mysql> select distinct usr,platform,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.platform=b.platform and a.start>b.start and a.start<=b.end);
    +-------+----------+----------+
    | usr   | platform | start    |
    +-------+----------+----------+
    | user1 | web      | 08:30:00 |
    | user2 | web      | 08:30:00 |
    | user2 | web      | 09:15:00 |
    | user1 | web      | 10:45:00 |
    | user1 | app      | 08:30:00 |
    | user2 | app      | 09:00:00 |
    | user1 | app      | 11:45:00 |
    | user2 | app      | 12:30:00 |
    | user1 | app      | 12:45:00 |
    | user1 | app      | 14:00:00 |
    | user2 | app      | 15:30:00 |
    +-------+----------+----------+
    

    之后的步骤类似,在最终统计时分别统计web和app的时长,并将app的时长乘以2再与web时长相加即可,不再赘述。

    相关文章

      网友评论

        本文标题:MYSQL经典SQL之时长统计

        本文链接:https://www.haomeiwen.com/subject/bvrxbttx.html