美文网首页
SQL-KPI的线上分析处理(OLAP)

SQL-KPI的线上分析处理(OLAP)

作者: adi0229 | 来源:发表于2020-03-25 17:25 被阅读0次

    案例 MovieNow

    一家虚构的在线影视租赁公司 MovieNow,它拥有电影信息(类型和主演)以及(用户信息,用户评分)。它需要决定:投资于哪些新电影?

    BI(商业智能目标)

    • 运营决策的数据
      • 根据演员的流行度,决定投资哪一部电影。
      • 根据上月的收入,辅助决定,短期的投资预算
    • 策略决策的数据
      • 在哪国更受欢迎,用来决定市场扩张
      • 长线的投资及收入

    KPI(关键绩效指标)

    • 租赁总收入
    • 用户满意度:所有电影的平均评分
    • 活跃用户总数:用户参与度

    OLAP(联机分析处理)

    联机分析处理(英语:Online analytical processing),简称OLAP (/ˈoʊlæp/),是计算机技术中快速解决多维分析问题(英语:multi-dimensional analytical)(MDA)的一种方法。[1] OLAP是更广泛的商业智能范畴的一部分,它还包括关系数据库、报告编写和数据挖掘[2] OLAP的典型应用包括销售业务报告(英语:business reporting)、市场营销、管理报告、业务流程管理(BPM)[3]预算预测财务报表以及类似领域,新的应用正在出现,如农业[4] 术语“OLAP”是对传统数据库术语“联机事务处理”(OLTP)稍微修改而成的。[5]

    下面主要是上卷(roll-up)和切片(slicing)的例子。

    上卷:汇集所有的电影租赁数据、用户评分,查询该租赁网站用户偏好的趋势。
    切片:根据不同维度,国家、性别对演员的评分进行筛选查看。

    ROLLUP运算符

    SELECT 
        c.country, 
        m.genre, 
        AVG(r.rating) AS avg_rating, 
        COUNT(*) AS num_rating
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    GROUP BY ROLLUP(c.country, m.genre)
    ORDER BY c.country, m.genre;
    

    按照租赁用户所属国家、电影的类型来查看平均评分及评分数目。

    相比与 GOURP BY , ROLLUP 包含了所有电影的平均评分,不同国家不限类型电影的平均评分,不同类型电影的平均评分。

    CUBE运算符

    SELECT 
        c.country, 
        m.genre, 
        AVG(r.rating) AS avg_rating 
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    GROUP BY CUBE(country, genre);
    

    按照租赁用户所属国家、电影的类型来查看平均评分及评分数目。

    ROLLUP 结果是 88条结果,CUBE 得到 96 条结果,多出来的结果,是因为 CUBE 生成所选列的所有合计组合。而 ROLLUP 是生成所选列的层级聚合结果。

    ROLLUP (YEAR, MONTH, DAY)
    
    With a ROLLUP, it will have the following outputs:
    
    YEAR, MONTH, DAY
    YEAR, MONTH
    YEAR
    ()
    With CUBE, it will have the following:
    
    YEAR, MONTH, DAY
    YEAR, MONTH
    YEAR, DAY
    YEAR
    MONTH, DAY
    MONTH
    DAY
    ()
    

    GROUPING SET运算符

    SELECT 
        nationality, 
        gender, 
        COUNT(*) 
    FROM actors
    GROUP BY GROUPING SETS ((nationality), (gender), ()); 
    

    GROUP SET 最灵活

    可以返回不同的 GROUP BY 检索的集合

    完整BI例子

    问题: 投资哪些新电影?需要考虑,新电影比旧电影更贵。

    • 用户对新旧电影的评分差异?
    • 不同国家用户之间,对新旧电影的评分,有差异吗?

    表联结(Join data)

    联结三张表,分别是 renting/customers/movies

    SELECT *
    FROM renting AS r
    LEFT JOIN customers AS c
    ON c.customer_id = r.customer_id
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id;
    

    选择相关记录(Select relevant records)

    SELECT *
    FROM renting AS r
    LEFT JOIN customers AS c
    ON c.customer_id = r.customer_id
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE r.movie_id IN (
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 4)
    AND r.date_renting >= '2018-04-01';
    

    聚集数据(Aggregation)

    SELECT c.country,
    m.year_of_release,
    COUNT(*) AS n_rentals,
    COUNT(DISTINCT r.movie_id) AS n_movies,
    AVG(rating) AS avg_rating
    FROM renting AS r
    LEFT JOIN customers AS c
    ON c.customer_id = r.customer_id
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE r.movie_id IN (
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 4)
    AND r.date_renting >= '2018-04-01'
    GROUP BY ROLLUP (m.year_of_release, c.country)
    ORDER BY c.country, m.year_of_release;
    

    按照不同的年份、国家来查看租赁次数、电影总数和平均评分

    SELECT genre,
           AVG(rating) AS avg_rating,
           COUNT(rating) AS n_rating,
           COUNT(*) AS n_rentals,     
           COUNT(DISTINCT m.movie_id) AS n_movies 
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >= 3 )
    AND r.date_renting >= '2018-01-01'
    GROUP BY genre
    ORDER BY avg_rating DESC; 
    

    筛选条件是 2018 年以后的租赁记录,评分超过 3 次的电影。
    根据类型 genre 分组,按照平均评分降序排序

    SELECT a.nationality,
           a.gender,
           AVG(r.rating) AS avg_rating,
           COUNT(r.rating) AS n_rating,
           COUNT(*) AS n_rentals,
           COUNT(DISTINCT a.actor_id) AS n_actors
    FROM renting AS r
    LEFT JOIN actsin AS ai
    ON ai.movie_id = r.movie_id
    LEFT JOIN actors AS a
    ON ai.actor_id = a.actor_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >= 4)
    AND r.date_renting >= '2018-04-01'
    GROUP BY CUBE(nationality, gender); 
    

    哪些国家/性别的演员更受欢迎?
    根据国籍、性别,查看演员参演的电影之中,所获得的平均评分、租赁字数、演员数目等等。

    参考资料:

    DataCamp<Data-Driven Decision Making in SQL>
    SQL中ROLLUP、CUBE的用法和区别
    understanding-the-differences-between-cube-and-rollup

    相关文章

      网友评论

          本文标题:SQL-KPI的线上分析处理(OLAP)

          本文链接:https://www.haomeiwen.com/subject/gwhtuhtx.html