美文网首页
Data-Driven Decision Making in S

Data-Driven Decision Making in S

作者: 鲸鱼酱375 | 来源:发表于2019-06-15 03:22 被阅读0次

    前提:本文所有课件来自datacamp,整理不易

    1. Average rating per customer

    SELECT customer_id,
          avg(rating),  
          count(rating), 
          count(renting_id)  
    FROM renting
    GROUP BY customer_id
     having count(renting_id) >7
    ORDER BY avg(rating);
    

    2.Calculate the revenue coming from movie rentals, the number of movie rentals and the number of customers who rented a movie.

    SELECT 
        sum(m.renting_price), 
        count(*), -- Count the number of rentals
        count(distinct(r.customer_id))  
    FROM renting AS r
    LEFT JOIN movies AS m
    ON r.movie_id = m.movie_id;
    

    3.You are asked to give an overview of which actors play in which movie.

    Create a list of actor names and movie titles in which they act. Make sure that each combination of actor and movie appears only once.

    SELECT m.title, 
           a.name
    FROM actsin as ac
    LEFT JOIN movies AS m
    ON m.movie_id = ac.movie_id
    LEFT JOIN actors AS a
    ON a.actor_id = ac.actor_id;
    

    4.Age of actors from the USA

    Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.
    Create a subsequent SELECT statements in the FROM clause to get all information about actors from the USA.
    Give the subsequent SELECT statement the alias a.
    Report for actors from the USA the year of birth of the oldest and the year of birth of the youngest actor and actress.

    SELECT a.gender, 
           max(a.year_of_birth), 
           min(a.year_of_birth) 
    FROM  (select * 
       from actors where nationality ='USA') as a
    GROUP BY a.gender;
    

    5.Identify favorite movies for a group of customers

    Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.

    Remove those movies from the table with only one rental.
    Order the result table such that movies with highest rating come first.

    SELECT m.title, 
    COUNT(*),
    AVG(r.rating)
    FROM renting AS r
    LEFT JOIN customers AS c
    ON c.customer_id = r.customer_id
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
    GROUP BY m.title
    having COUNT(*) > 1 
    order by AVG(r.rating) desc; 
    

    6.Identify favorite actors for Spain

    You're now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin use ai instead.
    report the favorite actors only for customers from Spain.

    SELECT a.name,  c.gender,
           COUNT(*) AS number_views, 
           AVG(r.rating) AS avg_rating
    FROM renting as r
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    LEFT JOIN actsin as ai
    ON r.movie_id = ai.movie_id
    LEFT JOIN actors as a
    ON ai.actor_id = a.actor_id
    where c.country ='Spain' 
    GROUP BY a.name, c.gender
    HAVING AVG(r.rating) IS NOT NULL 
      AND COUNT(*) > 5 
    ORDER BY avg_rating DESC, number_views DESC;
    

    网站答案有bug:count(*)>5 等同于 number_views>5

    7.KPIs per country

    Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
    Calculate the number of movie rentals.
    Calculate the average rating.
    Calculate the revenue from movie rentals.
    Report these KPIs for each country.

    SELECT 
        c.country,                   
        count(r.renting_id) AS number_renting, 
        avg(r.rating) AS average_rating,
        sum(m.renting_price) AS revenue      
    FROM renting AS r
    LEFT JOIN customers AS c
    ON c.customer_id = r.customer_id
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE date_renting >= '2019-01-01'
    group by c.country;
    

    8.Movies with rating above average

    For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
    Select movie IDs and calculate the average rating of movies with rating above average.

    SELECT movie_id,  
           AVG(rating)
    FROM renting
    GROUP BY movie_id
    HAVING AVG(rating) >   
        (SELECT AVG(rating)
        FROM renting);
    

    The advertising team only wants a list of movie titles. Report the movie titles of all movies with average rating higher than the total average.

    SELECT title
    FROM movies
    WHERE movie_id in 
        (SELECT movie_id
         FROM renting
         GROUP BY movie_id
         HAVING AVG(rating) > 
            (SELECT AVG(rating)
             FROM renting));
    

    9. Customers with at least one rating

    Having active customers is a key performance indicator for MovieNow. Make a list of customers who gave at least one rating.

    SELECT *
    FROM customers as c -- Select all customers with at least one rating
    WHERE exists
        (SELECT *
        FROM renting AS r
        WHERE rating IS NOT NULL 
        AND r.customer_id = c.customer_id);
    

    10.Actors in comedies

    In order to analyze the diversity of actors in comedies, first, report a list of actors who play in comedies and then, the number of actors for each nationality playing in comedies.

    10.1

    Create a list of all actors who play in a Comedy. Use the first letter of the table as an alias.

    SELECT *
    FROM actors AS a
    WHERE EXISTS
        (SELECT *
         FROM actsin AS ai
         LEFT JOIN movies AS m
         ON m.movie_id = ai.movie_id
         WHERE m.genre = 'Comedy'
         AND ai.actor_id = a.actor_id);
    

    10.2 Report the nationality and the number of actors for each nationality.

    SELECT a.nationality,count(*) -- Report the nationality and the number of actors for each nationality
    FROM actors AS a
    WHERE EXISTS
        (SELECT ai.actor_id
         FROM actsin AS ai
         LEFT JOIN movies AS m
         ON m.movie_id = ai.movie_id
         WHERE m.genre = 'Comedy'
         AND ai.actor_id = a.actor_id)
    group by a.nationality;
    

    11.Dramas with high ratings

    The advertising team has a new focus. They want to draw the attention of the customers to dramas. Make a list of all movies that are in the drama genre and have an average rating higher than 9.
    Select all movies of in the drama genre with an average rating higher than 9.

    SELECT *
    FROM movies
    where movie_id in -- Select all movies of genre drama with average rating higher than 9
       (SELECT movie_id
        FROM movies
        WHERE genre = 'Drama'
        INTERSECT
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING AVG(rating)>9);
    

    olap part

    1.Groups of customers

    Use the CUBE operator to extract the content of a pivot table from the database. Create a table with the total number of male and female customers from each country.

    • Create a table with the total number of customers, of all female and male customers, of the number of customers for each country and the number of men and women from each country.
    SELECT gender, -- Extract information of a pivot table of gender and country for the number of customers
           country,
           count(*)
    FROM customers
    GROUP BY CUBE (gender, country)
    ORDER BY country;
    

    如果是Group by ROLLUP(A, B, C)的话,首先会对(A、B、C)进行GROUP BY,然后对(A、B)进行GROUP BY,然后是(A)进行GROUP BY,最后对全表进行GROUP BY操作。
    如果是GROUP BY CUBE(A, B, C),则首先会对(A、B、C)进行GROUP BY,然后依次是(A、B),(A、C),(A),(B、C),(B),(C),最后对全表进行GROUP BY操作。 grouping_id()可以美化效果。除了使用GROUPING函数,还可以使用GROUPING_ID来标识GROUP BY的结果。
    也可以 Group by Rollup(A,(B,C)) ,Group by A Rollup(B,C),…… 这样任意按自己想要的形式结合统计数据,非常方便。

    2.Categories of movies

    Give an overview on the movies available on MovieNow. List the number of movies for different genres and release years.

    List the number of movies for different genres and the year of release on all aggregation levels by using the CUBE operator.

    SELECT count(*),
           genre,
           year_of_release
    FROM movies
    group by cube(genre,year_of_release)
    ORDER BY year_of_release;
    

    3.Analyzing average ratings

    Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.

    Calculate the average rating for each country.

    SELECT 
        avg(r.rating),
        c.country
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    group by c.country;
    

    Calculate the average rating for all aggregation levels of country and genre.

    SELECT 
        c.country, 
        m.genre, 
        AVG(r.rating) AS avg_rating -- Calculate the average rating 
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    group by cube(c.country,m.genre); -- For all aggregation levels of country and genre
    

    rollup

    1.Number of customers

    You have to give an overview of the number of customers for a presentation.

    Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
    Order the result by country and gender.

    SELECT country,
           gender,
           COUNT(*)
    FROM customers
    group by rollup (country, gender)
     order by country,gender; 
    

    2.Analyzing preferences of genres across countries

    You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.

    Finally, calculate the average ratings and the number of ratings for each country and genre, as well as an aggregation over all genres for each country and the overall average and total number.

    SELECT 
        c.country, 
        m.genre, 
        AVG(r.rating) AS avg_rating, 
        COUNT(*) AS num_rating
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    GROUP BY rollup(c.country, m.genre)
    ORDER BY c.country, m.genre;
    

    grouping set

    1.Exploring nationality and gender of actors

    For each movie in the database, the three most important actors are identified and listed in the table actors. This table includes the nationality and gender of the actors. We are interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.
    Count the number of actors in the table actors from each country, the number of male and female actors and the total number of actors.

    SELECT 
        nationality, -- Select nationality of the actors
        gender, -- Select gender of the actors
        count(*) -- Count the number of actors
    FROM actors
    GROUP BY GROUPING SETS ((nationality), (gender), ()); 
    

    2.Exploring rating by country and gender

    Now you will investigate the average rating of customers aggregated by country and gender.
    Now, use GROUPING SETS to get the same result, i.e. the average rating over country and gender.

    SELECT 
        c.country, 
        c.gender,
        AVG(r.rating)
    FROM renting AS r
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    GROUP BY GROUPING SETS ((country, gender));
    

    Report all information that is included in a pivot table for country and gender in one SQL table.

    SELECT 
        c.country, 
        c.gender,
        AVG(r.rating)
    FROM renting AS r
    LEFT JOIN customers AS c
    ON r.customer_id = c.customer_id
    -- Report all info from a Pivot table for country and gender
    GROUP BY GROUPING SETS ((country, gender), (country), (gender), ());
    

    final

    1.Customer preference for genres

    You just saw that customers have no clear preference for more recent movies over older ones. Now the management considers investing money in movies of the best rated genres.
    Select records of movies with at least 4 ratings, starting from 2018-04-01.

    SELECT *
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >= 4)
    AND r.date_renting >= '2018-04-01';
    

    For each genre, calculate the average rating (use the alias avg_rating), the number of ratings (use the alias n_rating), the number of movie rentals (use the alias n_rentals), and the number of distinct movies (use the alias n_movies).

    SELECT m.genre, -- For each genre, calculate:
           avg(r.rating) as avg_rating, 
           count(r.rating) as n_rating,
           count(r.renting_id) as n_rentals,    
           count(distinct(r.movie_id)) as n_movies
    FROM renting AS r
    LEFT JOIN movies AS m
    ON m.movie_id = r.movie_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >= 3)
    AND r.date_renting >= '2018-01-01'
    group by m.genre;
    

    2.Customer preference for actors

    The last aspect you have to analyze are customer preferences for certain actors.

    For each combination of the actors' nationality and gender, calculate the average rating, the number of ratings, the number of movie rentals, and the number of actors.

    SELECT a.nationality,
           a.gender,
           AVG(r.rating) AS avg_rating, -- The average rating
           COUNT(r.rating) AS n_rating, -- The number of ratings
           COUNT(*) AS n_rentals, -- The number of movie rentals
           COUNT(DISTINCT a.actor_id) AS n_actors -- The number of actors
    FROM renting AS r
    LEFT JOIN actsin AS ai
    ON ai.movie_id = r.movie_id
    LEFT JOIN actors AS a
    ON ai.actor_id = a.actor_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >=4 )
    AND r.date_renting >= '2018-04-01'
    GROUP BY a.nationality, a.gender;
    

    Provide results for all aggregation levels represented in a pivot table.

    SELECT a.nationality,
           a.gender,
           AVG(r.rating) AS avg_rating,
           COUNT(r.rating) AS n_rating,
           COUNT(*) AS n_rentals,
           COUNT(DISTINCT a.actor_id) AS n_actors
    FROM renting AS r
    LEFT JOIN actsin AS ai
    ON ai.movie_id = r.movie_id
    LEFT JOIN actors AS a
    ON ai.actor_id = a.actor_id
    WHERE r.movie_id IN ( 
        SELECT movie_id
        FROM renting
        GROUP BY movie_id
        HAVING COUNT(rating) >= 4)
    AND r.date_renting >= '2018-04-01'
    group by cube(a.nationality,a.gender); 
    

    相关文章

      网友评论

          本文标题:Data-Driven Decision Making in S

          本文链接:https://www.haomeiwen.com/subject/ithvfctx.html