题目
Activity 活动记录表
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id,event_date)是此表的主键
这张表显示了某些游戏的玩家的活动情况
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)
我们将玩家的安装日期定义为该玩家的第一个登录日。
我们还将某个日期 X 的第 1 天保留时间定义为安装日期为 X 的玩家的数量,他们在 X 之后的一天重新登录,除以安装日期为 X 的玩家的数量,四舍五入到小数点后两位。
编写一个 SQL 查询,报告每个安装日期、当天安装游戏的玩家数量和第一天的保留时间。
查询结果格式如下所示:
Activity 表:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-01 | 0 |
| 3 | 4 | 2016-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result 表:
+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2 | 0.50 |
| 2017-06-25 | 1 | 0.00 |
+------------+----------+----------------+
玩家 1 和 3 在 2016-03-01 安装了游戏,但只有玩家 1 在 2016-03-02 重新登录,所以 2016-03-01 的第一天保留时间是 1/2=0.50
玩家 2 在 2017-06-25 安装了游戏,但在 2017-06-26 没有重新登录,因此 2017-06-25 的第一天保留为 0/1=0.00
生成数据
-- DROP TABLE activity;
CREATE TABLE activity (
player_id INT,
device_id INT,
event_date DATE,
games_played INT,
PRIMARY KEY (player_id, event_date)
) ;
INSERT INTO Activity VALUES(1,2,'2016-03-01',5);
INSERT INTO Activity VALUES(1,2,'2016-03-02',6);
INSERT INTO Activity VALUES(2,3,'2017-06-25',1);
INSERT INTO Activity VALUES(3,1,'2016-03-01',0);
INSERT INTO Activity VALUES(3,4,'2018-07-03',5);
SELECT * FROM activity;
解答
查询每个用户的首次登陆日期
SELECT A.`player_id`, MIN(A.`event_date`)
FROM activity AS A
GROUP BY A.`player_id`;
查询首次登陆的日期 的每个日期的数量
SELECT tmp.first_date, COUNT(tmp.`player_id`) AS cnt
FROM (SELECT A.`player_id`, MIN(A.`event_date`) AS first_date
FROM activity AS A
GROUP BY A.`player_id`) AS tmp
GROUP BY tmp.first_date;
同理,查询每个用户首次登录的后一天
SELECT A.`player_id`, DATE_ADD(MIN(A.`event_date`), INTERVAL 1 DAY)
FROM activity AS A
GROUP BY A.`player_id`;
结合子查询查看首次登陆后一天的情况
SELECT A.`player_id`, A.`event_date`
FROM activity AS A
WHERE (A.`player_id`, A.`event_date`) IN
(SELECT A.`player_id`, DATE_ADD(MIN(A.`event_date`), INTERVAL 1 DAY)
FROM activity AS A
GROUP BY A.`player_id`);
再统计首次登陆后一天的情况 的每个日期数量
SELECT A.`event_date` AS second_date, COUNT(A.`player_id`) AS cnt
FROM activity AS A
WHERE (A.`player_id`, A.`event_date`) IN
(SELECT A.`player_id`, DATE_ADD(MIN(A.`event_date`), INTERVAL 1 DAY)
FROM activity AS A
GROUP BY A.`player_id`)
GROUP BY A.`event_date`;
连接两表求得结果
SELECT first_table.first_date AS install_dt, first_table.cnt AS installs,
ROUND(IFNULL(second_table.cnt,0)/ first_table.cnt,2) AS Day1_retention
FROM (SELECT tmp.first_date, COUNT(tmp.`player_id`) AS cnt
FROM (SELECT A.`player_id`, MIN(A.`event_date`) AS first_date
FROM activity AS A
GROUP BY A.`player_id`) AS tmp
GROUP BY tmp.first_date) AS first_table
LEFT JOIN (SELECT A.`event_date` AS second_date, COUNT(A.`player_id`) AS cnt
FROM activity AS A
WHERE (A.`player_id`, A.`event_date`) IN
(SELECT A.`player_id`, DATE_ADD(MIN(A.`event_date`), INTERVAL 1 DAY)
FROM activity AS A
GROUP BY A.`player_id`)
GROUP BY A.`event_date`) AS second_table
ON DATEDIFF(second_table.second_date, first_table.first_date) = 1
虽然实现了 但感觉代码冗长 不太好 效率可能也不太高
这个思路是先计算数量再连接 不太好
可以尝试先连接再统计数量
先连接再统计
每个用户的首次登陆日期与下一天进行连接
SELECT *
FROM (SELECT A.`player_id`, MIN(A.`event_date`) AS first_date
FROM activity AS A
GROUP BY A.`player_id`) AS tmp
LEFT OUTER JOIN activity AS A2
ON DATEDIFF(A2.`event_date`, tmp.first_date) = 1 AND A2.`player_id` = tmp.`player_id`;
对first_date进行分组统计数量 因为first_date后后右表的时间是一一对应的因此对于这个分组也可以对后一天的数量进行统计
SELECT tmp.first_date, COUNT(tmp.player_id), COUNT(A2.`player_id`)
FROM (SELECT A.`player_id`, MIN(A.`event_date`) AS first_date
FROM activity AS A
GROUP BY A.`player_id`) AS tmp
LEFT OUTER JOIN activity AS A2
ON DATEDIFF(A2.`event_date`, tmp.first_date) = 1 AND A2.`player_id` = tmp.`player_id`
GROUP BY tmp.first_date
比上边的简洁了许多,再求最后结果即可
SELECT tmp.first_date AS install_dt, COUNT(tmp.player_id) AS installs,
ROUND(COUNT(A2.`player_id`)/COUNT(tmp.player_id), 2) AS ay1_retention
FROM (SELECT A.`player_id`, MIN(A.`event_date`) AS first_date
FROM activity AS A
GROUP BY A.`player_id`) AS tmp
LEFT OUTER JOIN activity AS A2
ON DATEDIFF(A2.`event_date`, tmp.first_date) = 1 AND A2.`player_id` = tmp.`player_id`
GROUP BY tmp.first_date
别的解法
用表连接法求每个玩家的最小日期,即为安装日期。
利用A.event_date > B.event_date连接 如果没有比A表时间大的就是最小的日期
不存在比最小的日期更小的日期了。因此用left join。最小日期的B.event_date是不存在的。
SELECT A.player_id,A.event_date
FROM Activity AS A
LEFT JOIN Activity AS B ON (A.player_id = B.player_id AND A.event_date > B.event_date)
WHERE B.event_date IS NULL
再连接Activity表。算出最小日期第二天还登录的玩家。
SELECT A.player_id,A.event_date,C.player_id
FROM Activity AS A
LEFT JOIN Activity AS B ON (A.player_id = B.player_id AND A.event_date > B.event_date)
left JOIN Activity AS C
ON (A.player_id = C.player_id AND C.event_date = DATE_ADD(A.event_date,INTERVAL 1 DAY))
WHERE B.event_date IS NULL
基于此,对最小日期分组,统计安装数和1日后的留存率。
SELECT
A.event_date AS `install_dt`,
COUNT(A.player_id) AS `installs`,
round(COUNT(C.player_id)/COUNT(A.player_id),2) AS `Day1_retention`
FROM Activity AS A
left JOIN Activity AS B
ON (A.player_id = B.player_id AND A.event_date > B.event_date)
left JOIN Activity AS C
ON (A.player_id = C.player_id AND C.event_date = DATE_ADD(A.event_date,INTERVAL 1 DAY))
WHERE B.event_date IS NULL
GROUP BY A.event_date
网友评论