子查询
目的: 有时候,你要回答的问题无法通过直接处理数据库中的现有表格获得答案。但是,如果我们能通过现有的表格创建新的表格,我们就能查询这些新的表格,并回答我们的问题。
子查询简介
子查询和表格表达式都是用来通过查询创建一个表格,然后再编写一个查询来与这个新创建的表格进行互动。每当我们需要使用现有表格创建新的表格,然后需要再次查询时,就表明我们需要使用某种子查询。
某些查询(也成为内层查询或嵌套查询Subqueries)是可用于多个步骤的工具.
以营销经理为例, 哪些渠道平均每天发送给Parch&Posey的流量最大?
- 首先查询基础表, 确保数据可用
- 统计每天每个渠道的所有事件
- 将以上查询构建为子查询表, 要做到这一点, 首先将以上查询放入括号中, 并将其用在编写下个查询的FROM子句, 子查询需要别名, 在括号后面添加别名
- 在创建的事件列中计算平均值, 由于子查询就像FROM 子句的一个表格一样, 我们将在子查询之后放置一个GROUP BY子句, 由于现在是根据这个新的聚合重新查询, 因此不再需要在子查询中使用ORDER BY语句
SELECT channel,
AVG(event_count) AS avg_event_count
FROM
(SELECT DATE_TRUNC('day', occurred_at) AS day,
channel,
COUNT(*) AS event_count
FROM demo.web_events_full
GROUP BY 1, 2
) sub
GROUP BY 1
ORDER BY 2 DESC
以上代码, 首先运行内层查询, 然后针对内层查询创建的结果集来运行外层查询, 查询结果如下
image.png
子查询格式
在使用子查询时,要让读者能够轻松地判断查询的哪个部分将一起执行。大部分人的做法是按照某种方式缩进子查询.
- 格式糟糕的查询
示例1: 无法判断查询的作用
image.png示例2: 不是太糟糕, 但是还有更好的写法.
image.png- 格式清晰的查询
示例3:
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
示例4:
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub
GROUP BY channel
ORDER BY 2 DESC;
更多子查询内容
子查询可以在查询中的若干地方使用, 无论是表名称, 列名称还是个别值, 尤其适用于条件逻辑, 可与WHERE
或JOIN
子句协同使用, 或用于CASE语句的WHEN
部分.
例如, 你可能希望仅返回与Parch&Posies首个订单相同月份的订单
要获取首个订单的日期, 你可以编写一个带有MIN
函数的子查询, 以下代码获取首个订单的信息
SELECT MIN(occurred_at) AS min
FROM orders
在以上代码中, 再添加一个DATE_TRUNC函数来获取首个订单的月份
SELECT DATE_TRUNC('month', MIN(occurred_at)) AS min
FROM orders
最后使用上面的子查询, 来使用WHERE
筛选订单表, 并按occurred_at列进行分类,
SELECT *
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) AS min
FROM orders)
ORDER BY occurred_at
此查询可行的原因是子查询的结果只有一个单元格
image.png大多数条件逻辑都可以和包含单个单元格结果的子查询一起使用, 但是IN
是唯一一种, 当内层查询包含多个结果时可用的条件逻辑类型.
提示
注意,在条件语句中编写子查询时,不能包含别名。这是因为该子查询会被当做单个值(或者对于 IN 情况是一组值),而不是一个表格。
同时注意,这里的查询对应的是单个值。如果我们返回了整个列,则需要使用 IN 来执行逻辑参数。如果我们要返回整个表格,则必须为该表格使用别名,并对整个表格执行其他逻辑。
更多的子查询练习
相关ERD
1.提供每个区域拥有最高销售额 (total_amt_usd) 的销售代表的姓名。
步骤:
首先,我要算出与每个销售代表相关的总销售额 (total_amt_usd),并且要得出他们所在的区域。以下查询提供了这一信息。
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC;
接着,得出每个区域的最高销售额,然后使用该信息从最终结果中获取这些行。
SELECT region_name, MAX(total_amt) total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1, 2) inner1
GROUP BY 1;
最后, 本质上,这是两个表格的连接,其中区域和销售额相匹配。完整代码如下
SELECT t2.sales_rep_name, t2.region, t2.total_usd
FROM (SELECT region,
MAX(total_usd) AS total_usd
FROM (SELECT r.name region,
s.name sales_rep_name,
SUM(o.total_amt_usd) total_usd
FROM orders o
JOIN accounts a
ON o.account_id = a.id
JOIN sales_reps s
ON a.sales_rep_id = s.id
JOIN region r
ON s.region_id = r.id
GROUP BY 1,2) inner1
GROUP BY 1) t1
JOIN (SELECT r.name region,
s.name sales_rep_name,
SUM(o.total_amt_usd) total_usd
FROM orders o
JOIN accounts a
ON o.account_id = a.id
JOIN sales_reps s
ON a.sales_rep_id = s.id
JOIN region r
ON s.region_id = r.id
GROUP BY 1,2
ORDER BY 3 DESC) t2
ON t1.region = t2.region AND t1.total_usd = t2.total_usd
方案二:使用with设置子表(推荐)
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
2.对于具有最高销售额 (total_amt_usd) 的区域,总共下了多少个订单 (total count orders) ?
SELECT t1.region, t1.total_usd, t2.total_orders
FROM (SELECT region,
MAX(total_usd) AS total_usd
FROM (SELECT r.name region,
SUM(o.total) total_orders,
SUM(o.total_amt_usd) total_usd
FROM orders o
JOIN accounts a
ON o.account_id = a.id
JOIN sales_reps s
ON a.sales_rep_id = s.id
JOIN region r
ON s.region_id = r.id
GROUP BY 1) inner1
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) t1
备注:以上代码先找出最大销售的区域, 此为一个区域
JOIN (SELECT r.name region,
SUM(o.total) total_orders,
SUM(o.total_amt_usd) total_usd
FROM orders o
JOIN accounts a
ON o.account_id = a.id
JOIN sales_reps s
ON a.sales_rep_id = s.id
JOIN region r
ON s.region_id = r.id
GROUP BY 1) t2
ON t1.region = t2.region
备注: t1只有一个值,因此在和t2进行匹配时,只能匹配一行结果, 也就是我们需要的
方案二
WITH t1 AS (
SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name),
t2 AS (
SELECT MAX(total_amt)
FROM t1)
SELECT r.name, COUNT(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (SELECT * FROM t2)
3.对于购买标准纸张数量 (standard_qty) 最多的客户(在作为客户的整个时期内),有多少客户的购买总数依然更多?
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT a.name
FROM (SELECT a.name,
SUM(o.standard_qty) standard,
SUM(o.total) total
FROM orders o
JOIN accounts a
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) sub)
sub子查询代表的是标准纸张数量 (standard_qty) 最多的客户,
使其和所有客户的所有订单数进行比较, 也就是HAVING语句.
方案二
WITH t1 AS (
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1),
t2 AS (
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;
4.对于(在作为客户的整个时期内)总消费 (total_amt_usd) 最多的客户,他们在每个渠道上有多少 web_events?
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1) inner_table)
GROUP BY 1, 2
ORDER BY 3 DESC;
方案二
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1)
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id FROM t1)
GROUP BY 1, 2
ORDER BY 3 DESC;
5.对于总消费前十名的客户,他们的平均终身消费 (total_amt_usd) 是多少?
SELECT AVG(tot_spent)
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10) temp;
方案二
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10)
SELECT AVG(tot_spent)
FROM t1;
6.比所有客户的平均消费高的企业平均终身消费 (total_amt_usd) 是多少?
SELECT AVG(avg_amt)
FROM (SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id)) temp_table;
方案二
WITH t1 AS (
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id),
t2 AS (
SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;
网友评论