数据背景:
现有两张表,一张为用户浏览行为表,其中包含浏览日期(activity_date),浏览酒店(hotel_name),用户id(user_id),这三个字段,表名为activity intl info
activity_intl_info
另一张为订单表,其中包含预定日期(order_date)、订单号(order_no)、入住日期(checkin_date),离店日期(checkout_date),用户id(user_id),酒店名称(hotel_name),订单金额(order_amount)这六个字段,表名为order_intl_info,样例如下:
order_intl_info数据需求:
(1)统计昨天一天(2020-09-13)的浏览次数。
(2)统计过去1周之内(2020-09-07~2020-09-13)每天的订单量、收入、有订单的用户数,并按预定日期降序排列。
(3)找出过去1周之内(2020-09-07~2020-09-13)每天订单量最高的酒店名称。
(4)统计过去一周之内(2020-09-07~2020-09-13)每天下单用户占当天浏览用户的比例
(1)统计昨天一天(2020-09-13)的浏览次数
方法一:
SELECT
activity_date
,count(*)
FROM activity_intl_info
GROUP BY activity_date
HAVING activity_date='2020-09-13'
方法二:
SELECT
activity_date
,count(*)
FROM activity_intl_info
where activity_date='2020-09-13'
GROUP BY activity_date
需要注意group by后面不能使用where
(2)统计过去1周之内(2020-09-07~2020-09-13)每天的订单量、收入、有订单的用户数,并按预定日期降序排列。
方法一:
SELECT
order_date
,count(order_no)
,sum(order_amount)
,COUNT( DISTINCT user_id)
FROM order_intl_info
WHERE order_date BETWEEN '2020-09-07' AND '2020-09-13'
GROUP BY order_date
ORDER BY order_date DESC
方法二:
SELECT
order_date
,count(order_no)
,sum(order_amount)
,COUNT( DISTINCT user_id)
FROM order_intl_info
GROUP BY order_date
HAVING order_date BETWEEN '2020-09-07' AND '2020-09-13'
ORDER BY order_date DESC
需要注意去重是加DISTINCT,SQL里没有countd函数
(3)找出过去1周之内(2020-09-07~2020-09-13)每天订单量最高的酒店名称
SELECT
b.order_date
,b.hotel_name
,b.rn
FROM
(
SELECT
a.order_date
,a.hotel_name
,rank() over(PARTITION BY a.order_date ORDER BY a.CNT) rn
FROM
(
SELECT
order_date
,hotel_name
,count(order_no) AS CNT
FROM order_intl_info
WHERE order_date BETWEEN '2020-09-07' AND '2020-09-13'
GROUP BY order_date,hotel_name
)a
)b
WHERE rn=1
窗口函数order by 后面不能使用count函数,因此必须先进行聚合计算,
然后再进行排序,然后再选取符合条件的数据
(4)统计过去一周之内(2020-09-07~2020-09-13)每天下单用户占当天浏览用户的比例
SELECT
b.order_date
,cn2/cn1
FROM
(SELECT activity_date,count(DISTINCT user_id) cn1
FROM activity_intl_info
GROUP BY activity_date)a
JOIN
(SELECT order_date,count(DISTINCT user_id) cn2
FROM order_intl_info
GROUP BY order_date)b
ON a.activity_date=b.order_date
网友评论