美文网首页
数据分析课程笔记 - 15 - MySQL 金融案例分析

数据分析课程笔记 - 15 - MySQL 金融案例分析

作者: 爱学习的ai酱 | 来源:发表于2020-12-22 20:57 被阅读0次

这节课学习用 Mysql 处理业务数据,重难点在于如何把业务需求写成 Mysql 语句,查询并计算出想要的数据。

数据源可以在“学习癌”公众号后台回复“数据分析15”获取,导入Navicat即可操作。

一、Navicat 导入数据:

image.png image.png image.png

二、修改数据类型:

alter table pporder modify `user_id` int unsigned primary key;
alter table pporder modify `first_invest_date` datetime;
alter table pporder modify `last_invest_date` datetime;
alter table pporder modify `first_invest_money` int unsigned;
alter table pporder modify `sum_invest_money` int unsigned;
alter table pporder modify `first_invest_rate` float(5,4);
alter table pporder modify `first_invest_month` tinyint unsigned;
alter table pporder modify `user_name` varchar(255);
alter table pporder modify `chanel_id` varchar(20);
alter table pporder modify `city_id` varchar(20);
alter table pporder modify `system` varchar(20);
alter table pporder modify `age` tinyint unsigned;
alter table pporder modify `phonenum` varchar(11);
alter table pporder modify `sex` varchar(5);
alter table pporder modify `create_time` datetime;
alter table pporder modify `platform` varchar(20);
alter table pporder modify `country` varchar(20);
alter table pporder modify `area` varchar(20);
alter table pporder modify `province` varchar(20);
alter table pporder modify `city` varchar(20);

三、数据处理

需求1:求出总投资人数及总投资金额

SELECT 
    count( DISTINCT user_id ) AS 注册人数,
    sum( sum_invest_money ) AS 总投资金额 
FROM
    pporder;

需求2:不同年份注册的投资总人数,以及该人群总投资金额

SELECT YEAR
    ( create_time ) AS 注册年份,
    count( DISTINCT user_id ) AS 该年注册人数,
    sum( sum_invest_money ) AS 总投资金额 
FROM
    pporder 
GROUP BY
    YEAR ( create_time );

需求3:男性用户与女性用户的人数占比

第一步:计算比例:

SELECT
    sex AS 性别,
    count(*) AS 人数,
    count(*) / (select count(*) from pporder) as 人数占比
FROM
    pporder 
GROUP BY
    sex;

第二步:显示百分比:

SELECT
    sex AS 性别,
    count(*) AS 人数,
    concat( count(*) / (select count(*) from pporder) * 100 , '%') as 人数占比
FROM
    pporder 
GROUP BY
    sex;

第三步:保留两位小数:

SELECT
    sex AS 性别,
    count(*) AS 人数,
    concat( format((count(*) / (select count(*) from pporder) * 100), 2 ), '%') as 人数占比
FROM
    pporder 
GROUP BY
    sex;

需求4:贡献投资金额最大的十个年龄,以及相应总投资金额

SELECT
    age AS 年龄,
    sum( sum_invest_money ) AS 总投资金额 
FROM
    pporder 
GROUP BY
    age 
ORDER BY
    sum( sum_invest_money ) DESC
limit 10;

注:这里也可以直接写成 order by 总投资金额,因为我们已经重命名过了。

需求5:用户首投最喜欢的产品利率与用户首投最喜欢的产品周期

SELECT
    first_invest_rate,
    count(*) 
FROM
    pporder 
GROUP BY
    first_invest_rate
ORDER BY
    count(*) DESC 
    LIMIT 1;
SELECT
    first_invest_month,
    count(*) 
FROM
    pporder 
GROUP BY
    first_invest_month 
ORDER BY
    count(*) DESC 
    LIMIT 1;

需求6:每个年龄段用户从注册到投资的平均转化周期

SELECT
    age AS 年龄,
    avg(
    datediff( first_invest_date, create_time )) AS 平均转化周期 
FROM
    pporder 
GROUP BY
    age;

注:datediff 是计算两个日期时间间隔的函数,第一个参数是较晚的日期,第二个参数是较早的日期。

扩展:按年龄分组

SELECT
CASE
        
    WHEN
        age < 20 THEN '20岁以下' WHEN age >= 20 
            AND age < 25 THEN '20~25岁' WHEN age >= 25 
                AND age < 30 THEN '25~30岁' WHEN age >= 30 
                    AND age < 35 THEN '30~35岁' WHEN age >= 35 
                        AND age < 40 THEN   '35~40岁' ELSE '40岁以上' 
                            END AS age_type,
                            avg(
                        datediff( first_invest_date, create_time )) AS 平均转化周期 
                    FROM
                        pporder 
                GROUP BY
    age_type;

需求7:每个区域内,投资人数最多的10个城市

先把区域和城市的投资人数算出来然后排序:

SELECT
    area AS 区域,
    city AS 城市,
    count(*) AS 该城市投资总人数,
    row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn 
FROM
    pporder 
GROUP BY
    area,
    city;

注:这里窗口函数是每个区域一个窗口,PARTITION BY area

再嵌套,从中取出每个区域的前十:

select t.*
from (SELECT
    area AS 区域,
    city AS 城市,
    count(*) AS 该城市投资总人数,
    row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn 
FROM
    pporder 
GROUP BY
    area,
    city) as t
    where t.rn <= 10;

注意:这里最后一句不能用 limit 10,因为limit 是作用于整体的,而我们需要的是对每个区域的窗口结果取前十。

结果有null,可以去掉:

SELECT
    t.* 
FROM
    (
    SELECT
        area AS 区域,
        city AS 城市,
        count(*) AS 该城市投资总人数,
        row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn 
    FROM
        pporder 
    GROUP BY
        area,
        city 
    ) AS t 
WHERE
    t.rn <= 10
having t.区域 is not null;

或:

SELECT
    t.* 
FROM
    (
    SELECT
        area AS 区域,
        city AS 城市,
        count(*) AS 该城市投资总人数,
        row_number() over ( PARTITION BY area ORDER BY count(*) DESC ) AS rn 
    FROM
        pporder 
    WHERE area is not null
    GROUP BY
        area,
        city 
    ) AS t 
WHERE
    t.rn <= 10;

好啦,这节课的内容就是这些,建议先自己思考,尝试写写,实在也不出来再看参考代码~

由于SQL语句用Navicat做了格式美化,所以手机上看会有点乱,建议PC端使用哈~

相关文章

网友评论

      本文标题:数据分析课程笔记 - 15 - MySQL 金融案例分析

      本文链接:https://www.haomeiwen.com/subject/gglkgktx.html