前言
存在一份用户消费情况数据orderinfo,对其探讨以下问题:
- 统计不同月份的下单人数
- 统计用户三月份的回购率和复购率
- 统计男女用户的消费频次是否有差异
- 统计多次消费的用户,第一次和最后一次消费间隔是多少?
- 统计不同年龄段,用户的消费金额是否有差异?
6.统计消费的二八法则,消费的top20%用户,贡献了多少额度
数据源观察
image.png第一列 orderid 订单号(唯一)
第二列 userid 用户id(不唯一)
第三列 isPaid 是否已支付
第四列 price 价格
第五列 paidtime 下单时间
前期导入步骤
第一步 建表
create table orderinfo(
orderid int primary key not null ,
userid int,
isPaid varchar(10),
price float,
paidTime varchar(30));
第二步 导入数据
load data infile"C:/Users/Administrator/Desktop/order_info_utf.csv"
into table orderinfo fields terminated by ",";
第三步 清洗日期格式
#a、先把时间格式标准化变成 1993-02-27 这样的
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null;
#b、然后更新字符串为日期格式,然后才能使用日期函数进行操作,
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null;
#如果报 function str_to_datetime_value 错误,可以用 date_format
select * from orderinfo where paidtime='\r' limit 10;
#来看一下是否包含了 \r(回车) 符号,
#如果是包含了,则用下面语句再过滤掉
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null and paidtime <>'\r';
题目
- 统计不同月份的下单人数
select date_format(Paidtime,"%Y-%m"),count(distinct userid) from orderinfo
where isPaid = "已支付" group by month(PaidTime);
image.png
- 统计用户三月份的回购率和复购率
2.1 回购率:3月份买了 4月份也买的userid
select count(distinct userid) from orderinfo
where userid in (select distinct userid from orderinfo where isPaid = "已支付"
and month(PaidTime) = 3) and month(Paidtime) = 4;
注:子查询是列子查询(一列多行),故用 in 判断userid是否在其中。
2.2 延伸 每个月的回购率如何
select t1.m,count(t1.m),count(t2.m),count(t2.m)/count(t1.m) from
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t1
left join
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t2
on t1.userid = t2.userid and t2.m=date_add(t1.m,interval 1 month) group by t1.m;
利用相同表连接,然后进行月份筛选和userid限制。
image.png
2.3 复购率:3月份购买两次以上
select count(a),count(if(a>1,1,null)),count(if(a>1,1,null))/count(a) from
(select count(userid) a from orderinfo
where isPaid = "已支付" and month(Paidtime) = 3 group by userid) t1;
image.png
按userid分组计数,超过2次及以上则为复购。
- 统计男女用户的消费频次是否有差异
select count(o.userid)/count(distinct o.userid),sex from orderinfo o
inner join userinfo u on o.userid = u.userid where sex in ("男","女")
and isPaid = "已支付" group by sex;
image.png
按性别分组,求购买频次
- 统计多次消费的用户,第一次和最后一次消费间隔是多少?
select userid,count(userid),min(Paidtime) as firstpaidtime,max(Paidtime) as lastpaidtime,datediff(max(Paidtime),min(Paidtime)) as timediff
from orderinfo where isPaid = "已支付" group by userid having count(userid)>1;
image.png
select avg(a) from (
select userid,count(userid),max(paidtime),min(paidtime),datediff(max(paidtime),min(paidtime)) a from orderinfo
where ispaid="已支付" group by userid having count(userid)>=2) b;
image.png
- 统计不同年龄段,用户的消费金额是否有差异?
select 年龄段,sum(price) from orderinfo o inner join
(select *,2015-year(birth) as age,ceil((2015-year(birth))/10) as 年龄段 from userinfo
where birth >1900-01-02 having year(now())-year(birth)<100) t1
on o.userid = t1.userid where ispaid = "已支付" group by 年龄段;
image.png
6.统计消费的二八法则,消费的top20%用户,贡献了多少额度
select round(count(distinct userid)*0.2,0) from orderinfo where isPaid = "已支付";
image.png
select sum(a) from
(select *,sum(price) a from orderinfo where isPaid = "已支付"
group by userid order by sum(price) desc
limit 17130) b;
image.png
看下下面两条语句差异:
select count(userid)*0.2,sum(total) from (
select userid,sum(price) total from orderinfo
where ispaid = "已支付"
group by userid
order by sum(price) desc) t;
image.png
上述语句是将所有的消费金额进行统计了。
select count(userid),sum(total) from (
select userid,sum(price) total from orderinfo
where ispaid = "已支付"
group by userid
order by sum(price) desc
limit 17129) t;
image.png
网友评论