这节课学习用 Mysql 处理业务数据,重难点在于如何把业务需求写成 Mysql 语句,查询并计算出想要的数据。
数据源可以在“学习癌”公众号后台回复“数据分析15”获取,导入Navicat即可操作。
一、Navicat 导入数据:
![](https://img.haomeiwen.com/i8919282/5e3fa38fcecabdbd.png)
![](https://img.haomeiwen.com/i8919282/0816258223f48e6b.png)
![](https://img.haomeiwen.com/i8919282/802fd55ee153e861.png)
二、修改数据类型:
alter table pporder modify `user_id` int unsigned primary key;
alter table pporder modify `first_invest_date` datetime;
alter table pporder modify `last_invest_date` datetime;
alter table pporder modify `first_invest_money` int unsigned;
alter table pporder modify `sum_invest_money` int unsigned;
alter table pporder modify `first_invest_rate` float(5,4);
alter table pporder modify `first_invest_month` tinyint unsigned;
alter table pporder modify `user_name` varchar(255);
alter table pporder modify `chanel_id` varchar(20);
alter table pporder modify `city_id` varchar(20);
alter table pporder modify `system` varchar(20);
alter table pporder modify `age` tinyint unsigned;
alter table pporder modify `phonenum` varchar(11);
alter table pporder modify `sex` varchar(5);
alter table pporder modify `create_time` datetime;
alter table pporder modify `platform` varchar(20);
alter table pporder modify `country` varchar(20);
alter table pporder modify `area` varchar(20);
alter table pporder modify `province` varchar(20);
alter table pporder modify `city` varchar(20);
三、数据处理
需求1:求出总投资人数及总投资金额
SELECT
count( DISTINCT user_id ) AS 注册人数,
sum( sum_invest_money ) AS 总投资金额
FROM
pporder;
需求2:不同年份注册的投资总人数,以及该人群总投资金额
SELECT YEAR
( create_time ) AS 注册年份,
count( DISTINCT user_id ) AS 该年注册人数,
sum( sum_invest_money ) AS 总投资金额
FROM
pporder
GROUP BY
YEAR ( create_time );
需求3:男性用户与女性用户的人数占比
第一步:计算比例:
SELECT
sex AS 性别,
count(*) AS 人数,
count(*) / (select count(*) from pporder) as 人数占比
FROM
pporder
GROUP BY
sex;
第二步:显示百分比:
SELECT
sex AS 性别,
count(*) AS 人数,
concat( count(*) / (select count(*) from pporder) * 100 , '%') as 人数占比
FROM
pporder
GROUP BY
sex;
第三步:保留两位小数:
SELECT
sex AS 性别,
count(*) AS 人数,
concat( format((count(*) / (select count(*) from pporder) * 100), 2 ), '%') as 人数占比
FROM
pporder
GROUP BY
sex;
需求4:贡献投资金额最大的十个年龄,以及相应总投资金额
SELECT
age AS 年龄,
sum( sum_invest_money ) AS 总投资金额
FROM
pporder
GROUP BY
age
ORDER BY
sum( sum_invest_money ) DESC
limit 10;
注:这里也可以直接写成 order by 总投资金额,因为我们已经重命名过了。
需求5:用户首投最喜欢的产品利率与用户首投最喜欢的产品周期
SELECT
first_invest_rate,
count(*)
FROM
pporder
GROUP BY
first_invest_rate
ORDER BY
count(*) DESC
LIMIT 1;
SELECT
first_invest_month,
count(*)
FROM
pporder
GROUP BY
first_invest_month
ORDER BY
count(*) DESC
LIMIT 1;
需求6:每个年龄段用户从注册到投资的平均转化周期
SELECT
age AS 年龄,
avg(
datediff( first_invest_date, create_time )) AS 平均转化周期
FROM
pporder
GROUP BY
age;
注:datediff 是计算两个日期时间间隔的函数,第一个参数是较晚的日期,第二个参数是较早的日期。
扩展:按年龄分组
SELECT
CASE
WHEN
age < 20 THEN '20岁以下' WHEN age >= 20
AND age < 25 THEN '20~25岁' WHEN age >= 25
AND age < 30 THEN '25~30岁' WHEN age >= 30
AND age < 35 THEN '30~35岁' WHEN age >= 35
AND age < 40 THEN '35~40岁' ELSE '40岁以上'
END AS age_type,
avg(
datediff( first_invest_date, create_time )) AS 平均转化周期
FROM
pporder
GROUP BY
age_type;
需求7:每个区域内,投资人数最多的10个城市
先把区域和城市的投资人数算出来然后排序:
SELECT
area AS 区域,
city AS 城市,
count(*) AS 该城市投资总人数,
row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn
FROM
pporder
GROUP BY
area,
city;
注:这里窗口函数是每个区域一个窗口,PARTITION BY area
再嵌套,从中取出每个区域的前十:
select t.*
from (SELECT
area AS 区域,
city AS 城市,
count(*) AS 该城市投资总人数,
row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn
FROM
pporder
GROUP BY
area,
city) as t
where t.rn <= 10;
注意:这里最后一句不能用 limit 10,因为limit 是作用于整体的,而我们需要的是对每个区域的窗口结果取前十。
结果有null,可以去掉:
SELECT
t.*
FROM
(
SELECT
area AS 区域,
city AS 城市,
count(*) AS 该城市投资总人数,
row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn
FROM
pporder
GROUP BY
area,
city
) AS t
WHERE
t.rn <= 10
having t.区域 is not null;
或:
SELECT
t.*
FROM
(
SELECT
area AS 区域,
city AS 城市,
count(*) AS 该城市投资总人数,
row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn
FROM
pporder
WHERE area is not null
GROUP BY
area,
city
) AS t
WHERE
t.rn <= 10;
好啦,这节课的内容就是这些,建议先自己思考,尝试写写,实在也不出来再看参考代码~
由于SQL语句用Navicat做了格式美化,所以手机上看会有点乱,建议PC端使用哈~
网友评论