前提:本文所有课件来自datacamp,整理不易
1. Average rating per customer
SELECT customer_id,
avg(rating),
count(rating),
count(renting_id)
FROM renting
GROUP BY customer_id
having count(renting_id) >7
ORDER BY avg(rating);
2.Calculate the revenue coming from movie rentals, the number of movie rentals and the number of customers who rented a movie.
SELECT
sum(m.renting_price),
count(*), -- Count the number of rentals
count(distinct(r.customer_id))
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;
3.You are asked to give an overview of which actors play in which movie.
Create a list of actor names and movie titles in which they act. Make sure that each combination of actor and movie appears only once.
SELECT m.title,
a.name
FROM actsin as ac
LEFT JOIN movies AS m
ON m.movie_id = ac.movie_id
LEFT JOIN actors AS a
ON a.actor_id = ac.actor_id;
4.Age of actors from the USA
Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.
Create a subsequent SELECT statements in the FROM clause to get all information about actors from the USA.
Give the subsequent SELECT statement the alias a.
Report for actors from the USA the year of birth of the oldest and the year of birth of the youngest actor and actress.
SELECT a.gender,
max(a.year_of_birth),
min(a.year_of_birth)
FROM (select *
from actors where nationality ='USA') as a
GROUP BY a.gender;
5.Identify favorite movies for a group of customers
Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.
Remove those movies from the table with only one rental.
Order the result table such that movies with highest rating come first.
SELECT m.title,
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
having COUNT(*) > 1
order by AVG(r.rating) desc;
6.Identify favorite actors for Spain
You're now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin use ai instead.
report the favorite actors only for customers from Spain.
SELECT a.name, c.gender,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
where c.country ='Spain'
GROUP BY a.name, c.gender
HAVING AVG(r.rating) IS NOT NULL
AND COUNT(*) > 5
ORDER BY avg_rating DESC, number_views DESC;
网站答案有bug:count(*)>5 等同于 number_views>5
7.KPIs per country
Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
Calculate the number of movie rentals.
Calculate the average rating.
Calculate the revenue from movie rentals.
Report these KPIs for each country.
SELECT
c.country,
count(r.renting_id) AS number_renting,
avg(r.rating) AS average_rating,
sum(m.renting_price) AS revenue
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
group by c.country;
8.Movies with rating above average
For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
Select movie IDs and calculate the average rating of movies with rating above average.
SELECT movie_id,
AVG(rating)
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >
(SELECT AVG(rating)
FROM renting);
The advertising team only wants a list of movie titles. Report the movie titles of all movies with average rating higher than the total average.
SELECT title
FROM movies
WHERE movie_id in
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >
(SELECT AVG(rating)
FROM renting));
9. Customers with at least one rating
Having active customers is a key performance indicator for MovieNow. Make a list of customers who gave at least one rating.
SELECT *
FROM customers as c -- Select all customers with at least one rating
WHERE exists
(SELECT *
FROM renting AS r
WHERE rating IS NOT NULL
AND r.customer_id = c.customer_id);
10.Actors in comedies
In order to analyze the diversity of actors in comedies, first, report a list of actors who play in comedies and then, the number of actors for each nationality playing in comedies.
10.1
Create a list of all actors who play in a Comedy. Use the first letter of the table as an alias.
SELECT *
FROM actors AS a
WHERE EXISTS
(SELECT *
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = a.actor_id);
10.2 Report the nationality and the number of actors for each nationality.
SELECT a.nationality,count(*) -- Report the nationality and the number of actors for each nationality
FROM actors AS a
WHERE EXISTS
(SELECT ai.actor_id
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
WHERE m.genre = 'Comedy'
AND ai.actor_id = a.actor_id)
group by a.nationality;
11.Dramas with high ratings
The advertising team has a new focus. They want to draw the attention of the customers to dramas. Make a list of all movies that are in the drama genre and have an average rating higher than 9.
Select all movies of in the drama genre with an average rating higher than 9.
SELECT *
FROM movies
where movie_id in -- Select all movies of genre drama with average rating higher than 9
(SELECT movie_id
FROM movies
WHERE genre = 'Drama'
INTERSECT
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating)>9);
olap part
1.Groups of customers
Use the CUBE operator to extract the content of a pivot table from the database. Create a table with the total number of male and female customers from each country.
- Create a table with the total number of customers, of all female and male customers, of the number of customers for each country and the number of men and women from each country.
SELECT gender, -- Extract information of a pivot table of gender and country for the number of customers
country,
count(*)
FROM customers
GROUP BY CUBE (gender, country)
ORDER BY country;
如果是Group by ROLLUP(A, B, C)的话,首先会对(A、B、C)进行GROUP BY,然后对(A、B)进行GROUP BY,然后是(A)进行GROUP BY,最后对全表进行GROUP BY操作。
如果是GROUP BY CUBE(A, B, C),则首先会对(A、B、C)进行GROUP BY,然后依次是(A、B),(A、C),(A),(B、C),(B),(C),最后对全表进行GROUP BY操作。 grouping_id()可以美化效果。除了使用GROUPING函数,还可以使用GROUPING_ID来标识GROUP BY的结果。
也可以 Group by Rollup(A,(B,C)) ,Group by A Rollup(B,C),…… 这样任意按自己想要的形式结合统计数据,非常方便。
2.Categories of movies
Give an overview on the movies available on MovieNow. List the number of movies for different genres and release years.
List the number of movies for different genres and the year of release on all aggregation levels by using the CUBE operator.
SELECT count(*),
genre,
year_of_release
FROM movies
group by cube(genre,year_of_release)
ORDER BY year_of_release;
3.Analyzing average ratings
Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.
Calculate the average rating for each country.
SELECT
avg(r.rating),
c.country
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
group by c.country;
Calculate the average rating for all aggregation levels of country and genre.
SELECT
c.country,
m.genre,
AVG(r.rating) AS avg_rating -- Calculate the average rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
group by cube(c.country,m.genre); -- For all aggregation levels of country and genre
rollup
1.Number of customers
You have to give an overview of the number of customers for a presentation.
Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
Order the result by country and gender.
SELECT country,
gender,
COUNT(*)
FROM customers
group by rollup (country, gender)
order by country,gender;
2.Analyzing preferences of genres across countries
You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.
Finally, calculate the average ratings and the number of ratings for each country and genre, as well as an aggregation over all genres for each country and the overall average and total number.
SELECT
c.country,
m.genre,
AVG(r.rating) AS avg_rating,
COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY rollup(c.country, m.genre)
ORDER BY c.country, m.genre;
grouping set
1.Exploring nationality and gender of actors
For each movie in the database, the three most important actors are identified and listed in the table actors. This table includes the nationality and gender of the actors. We are interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.
Count the number of actors in the table actors from each country, the number of male and female actors and the total number of actors.
SELECT
nationality, -- Select nationality of the actors
gender, -- Select gender of the actors
count(*) -- Count the number of actors
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ());
2.Exploring rating by country and gender
Now you will investigate the average rating of customers aggregated by country and gender.
Now, use GROUPING SETS to get the same result, i.e. the average rating over country and gender.
SELECT
c.country,
c.gender,
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY GROUPING SETS ((country, gender));
Report all information that is included in a pivot table for country and gender in one SQL table.
SELECT
c.country,
c.gender,
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
-- Report all info from a Pivot table for country and gender
GROUP BY GROUPING SETS ((country, gender), (country), (gender), ());
final
1.Customer preference for genres
You just saw that customers have no clear preference for more recent movies over older ones. Now the management considers investing money in movies of the best rated genres.
Select records of movies with at least 4 ratings, starting from 2018-04-01.
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01';
For each genre, calculate the average rating (use the alias avg_rating), the number of ratings (use the alias n_rating), the number of movie rentals (use the alias n_rentals), and the number of distinct movies (use the alias n_movies).
SELECT m.genre, -- For each genre, calculate:
avg(r.rating) as avg_rating,
count(r.rating) as n_rating,
count(r.renting_id) as n_rentals,
count(distinct(r.movie_id)) as n_movies
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 3)
AND r.date_renting >= '2018-01-01'
group by m.genre;
2.Customer preference for actors
The last aspect you have to analyze are customer preferences for certain actors.
For each combination of the actors' nationality and gender, calculate the average rating, the number of ratings, the number of movie rentals, and the number of actors.
SELECT a.nationality,
a.gender,
AVG(r.rating) AS avg_rating, -- The average rating
COUNT(r.rating) AS n_rating, -- The number of ratings
COUNT(*) AS n_rentals, -- The number of movie rentals
COUNT(DISTINCT a.actor_id) AS n_actors -- The number of actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >=4 )
AND r.date_renting >= '2018-04-01'
GROUP BY a.nationality, a.gender;
Provide results for all aggregation levels represented in a pivot table.
SELECT a.nationality,
a.gender,
AVG(r.rating) AS avg_rating,
COUNT(r.rating) AS n_rating,
COUNT(*) AS n_rentals,
COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
group by cube(a.nationality,a.gender);
网友评论