数据分析这项技能,不管是什么岗位,有总是加分项,一切都是为了数据,能从数据中发现机会那就更有优势了。
以下记录一些有用的SQL片段:
创建临时表
从子查询 => 创建临时表查询
当子查询太多的时候,使用这种方式可以让SQL更清晰
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto")
AND salary >= (SELECT AVG(salary)
FROM salaries
WHERE gender = "Female")
-- 临时表查询
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)
临时函数
同写代码一样,一些重复的输入输出,提取为一个临时函数,让代码变得更清晰。
SELECT name
, CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees
-- 临时函数
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name
, get_seniority(tenure) as seniority
FROM employees
增加列,制作宽表
根据条件增加case when
SELECT name
,case when age < 18 then '未成年' end as desc1
,case when city in ('伤害') then '一线城市' end as desc2
,..
FROM employees
排序 row_number rank dense_rank
SELECT Name
, GPA
, ROW_NUMBER() OVER (ORDER BY GPA desc)
, RANK() OVER (ORDER BY GPA desc)
, DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
image-20211115081901264
计算增量
在需要比较不同周期的值时需要
# Comparing each month's sales to last month
SELECT month
, sales
, sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales
# Comparing each month's sales to the same month last year
SELECT month
, sales
, sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales
计算累加曲线
SELECT Month
, Revenue
, SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
image-20211115081750800
网友评论