-
Notifications
You must be signed in to change notification settings - Fork 0
/
DataCamp.Course_027_Data_Driven_Decision_Making_in_SQL.sql
1986 lines (1709 loc) · 55.1 KB
/
DataCamp.Course_027_Data_Driven_Decision_Making_in_SQL.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
######################################################################
######################################################################
######################################################################
# COURSE 027_Data-Driven Decision Making in SQL
Aim of this course
A short review of SQL know-how
Apply your SQL know-how to extract business insights from data
Learn about new SQL statements to summarize data
OLAP extensions were developed specifically for business intelligence
Examples are CUBE, ROLLUP and GROUPING SETS
MovieNow: an online movie rental company
Platform to stream movies
Additional information for each movie: genre, main actors, etc.
Customer information
Customers can give a rating after watching a movie
######################################################################
######################################################################
######################################################################
######## Introduction to business intelligence for a online movie rental database (Module 01-027)
######################################################################
video
Objectives of data driven decision making
Information for operational decisions
Popularity of actors to decide which movies to invest in.
Revenue of the last months to estimate budget for short term investments.
Information for strategic decisions
Success across countries to decide on market extensions.
Longterm development of revenue for long term investments.
KPIs: Key Performance Indicators
Extract information from the data which is relevant to measure the success of MovieNow.
Total number of rentals: revenue
The average rating of all movies: customer satisfaction
Number of active customers: customer engagement
#---------------------------------------------------------------------
Exploring the table renting
The table renting includes all records of movie rentals. Each record has a unique ID renting_id. It also contains information about customers (customer_id) and which movies they watched (movie_id). Furthermore, customers can give a rating after watching the movie, and the day the movie was rented is recorded.
step01
SELECT * -- Select all
FROM renting; -- From table renting
step02
SELECT movie_id, -- Select all columns needed to compute the average rating per movie
rating
FROM renting;
step03
answered
#---------------------------------------------------------------------
video
Filtering and
ordering
***WHERE
Select all customers from Italy:
SELECT *
FROM customers
WHERE country = 'Italy';
***Operators in the WHERE clause
Comparison operators:
Equal =
Not equal <>
Less than <
Less than or equal to <=
Greater than >
Greater than or equal to >=
BETWEEN operator
IN operator
IS NULL and IS NOT NULL operators
***Example comparison operators
Select all columns from movies where the genre is not Drama.
SELECT *
FROM movies
WHERE genre <> 'Drama';
Select all columns from movies where the price for renting is larger equal 2.
SELECT *
FROM movies
WHERE renting_price >= 2;
***Example: BETWEEN operator
Select all columns of customers where the date when the account was created is between 2018-01-
01 and 2018-08-31.
SELECT *
FROM customers
WHERE date_account_start BETWEEN '2018-01-01' AND '2018-09-31';
***Example: IN operator
Select all actors with nationality USA or Australia.
SELECT *
FROM actors
WHERE nationality IN ('USA', 'Australia')
***Example: NULL operator
Select all columns from renting where rating is NULL .
SELECT *
FROM renting
WHERE rating IS NULL
Select all columns from renting where rating is not NULL .
SELECT *
FROM renting
WHERE rating IS NOT NULL
***Boolean operators AND
Select customer name and the date when they created their account for customers who are from Italy
AND who created an account between 2018-01-01 and 2018-08-31.
SELECT name, date_account_start
FROM customers
WHERE country = 'Italy'
AND date_account_start BETWEEN '2018-01-01' AND '2018-08-31';
***Boolean operators OR
Select customer name and the date when they created their account for customers who are from Italy
OR who created an account between 2018-01-01 and 2018-08-31.
SELECT name, date_account_start
FROM customers
WHERE country = 'Italy'
OR date_account_start BETWEEN '2018-01-01' AND '2018-08-31';
***ORDER BY
Order the results of a query by rating.
SELECT *
FROM renting
WHERE rating IS NOT NULL
ORDER BY rating;
***ORDER BY ... DESC
Order the results of a query by rating in descending order.
SELECT *
FROM renting
WHERE rating IS NOT NULL
ORDER BY rating DESC;
#---------------------------------------------------------------------
Working with dates
For the analysis of monthly or annual changes, it is important to select data from specific time periods. You will select records from the table renting of movie rentals. The format of dates is 'YYYY-MM-DD'.
step01
SELECT *
FROM renting
WHERE date_renting = '2018-10-09'; -- Movies rented on October 9th, 2018
step02
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'; -- from beginning April 2018 to end August 2018
step03
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-04-01' AND '2018-08-31'
ORDER BY date_renting DESC; -- Order by recency in decreasing order
#---------------------------------------------------------------------
Selecting movies
The table movies contains all movies available on the online platform.
step01
SELECT *
FROM movies
WHERE genre <> 'Drama'; -- All genres except drama
step02
SELECT *
FROM movies
WHERE title IN ('Showtime', 'Love Actually', 'The Fighter'); -- Select all movies with the given titles
step03
SELECT *
FROM movies
ORDER BY renting_price ASC ; -- Order the movies by increasing renting price
#---------------------------------------------------------------------
Select from renting
Only some users give a rating after watching a movie. Sometimes it is interesting to explore only those movie rentals where a rating was provided.
SELECT *
FROM renting
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31' -- Renting in 2018
AND rating IS NOT NULL; -- Rating exists
#---------------------------------------------------------------------
Aggregations -
summarizing data
***Overview aggregations
SELECT AVG(renting_price)
FROM movies;
Some aggregate functions in SQL
AVG()
SUM()
COUNT()
MIN()
MAX()
***Aggregation with NULL values
SELECT COUNT(*)
FROM actors;
Result: 145
SELECT COUNT(name)
FROM actors;
Result: 145
SELECT COUNT(year_of_birth)
FROM actors;
Result: 143
***DISTINCT
SELECT DISTINCT country
FROM customers;
SELECT COUNT(DISTINCT country)
FROM customers;
***DISTINCT with `NULL` values
SELECT DISTINCT rating
FROM renting
ORDER BY rating;
***Give an alias to column names
SELECT AVG(renting_price) AS average_price,
COUNT(DISTINCT genre) AS number_genres
FROM movies;
#---------------------------------------------------------------------
Summarizing customer information
In most business decisions customers are analyzed in groups, such as customers per country or customers per age group.
step01
SELECT COUNT(*) -- Count the total number of customers
FROM customers
WHERE date_of_birth BETWEEN '1980-01-01' AND '1989-12-31'; -- Select customers born between 1980-01-01 and 1989-12-31
STEP02
SELECT COUNT(*) -- Count the total number of customers
FROM customers
WHERE country = 'Germany'; -- Select all customers from Germany
STEP03
SELECT COUNT(DISTINCT country) -- Count the number of countries
FROM customers;
#---------------------------------------------------------------------
Ratings of movie 25
The movie ratings give us insight into the preferences of our customers. Report summary statistics, such as the minimum, maximum, average, and count, of ratings for the movie with ID 25.
SELECT MIN(rating) AS min_rating, -- Calculate the minimum rating and use alias min_rating
MAX(rating) AS max_rating, -- Calculate the maximum rating and use alias max_rating
AVG(rating) AS avg_rating, -- Calculate the average rating and use alias avg_rating
COUNT(rating) AS number_ratings -- Count the number of ratings and use alias number_ratings
FROM renting
WHERE movie_id = 25; -- Select all records of the movie with ID 25
#---------------------------------------------------------------------
Examining annual rentals
You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.
STEP01
SELECT * -- Select all records of movie rentals since January 1st 2019
FROM renting
WHERE date_renting BETWEEN '2019-01-01' AND NOW();
STEP02
SELECT
COUNT(*), -- Count the total number of rented movies
AVG(rating) -- Add the average rating
FROM renting
WHERE date_renting >= '2019-01-01';
STEP03
SELECT
COUNT(*) AS number_renting, -- Give it the column name number_renting
AVG(rating) AS average_rating -- Give it the column name average_rating
FROM renting
WHERE date_renting >= '2019-01-01';
STEP04
SELECT
COUNT(*) AS number_renting,
AVG(rating) AS average_rating,
COUNT(rating) AS number_ratings -- Add the total number of ratings here.
FROM renting
WHERE date_renting >= '2019-01-01';
######################################################################
######################################################################
######################################################################
######## Decision Making with simple SQL queries (Module 02-027)
######################################################################
VIDEO
Grouping movies
***GROUP BY Applications
Preferences of customers by country or gender.
The popularity of movies by genre or year of release.
The average price of movies by genre.
***GROUP BY
SELECT genre
FROM movies_selected
GROUP BY genre;
***Average renting price
SELECT genre,
AVG(renting_price) AS avg_price
FROM movies_selected
GROUP BY genre;
***Average rental price and number of movies
SELECT genre,
AVG(renting_price) AS avg_price,
COUNT(*) AS number_movies
FROM movies_selected
GROUP BY genre
***HAVING
SELECT genre,
AVG(renting_price) avg_price,
COUNT(*) number_movies
FROM movies
GROUP BY genre
HAVING COUNT(*) > 2;
#---------------------------------------------------------------------
First account for each country.
Conduct an analysis to see when the first customer accounts were created for each country.
SELECT country, -- For each country report the earliest date when an account was created
MIN(date_account_start) AS first_account
FROM customers
GROUP BY country
ORDER BY country ASC;
#---------------------------------------------------------------------
Average movie ratings
For each movie the average rating, the number of ratings and the number of views has to be reported. Generate a table with meaningful column names.
STEP01
SELECT movie_id,
AVG(rating) -- Calculate average rating per movie
FROM renting
GROUP BY movie_id;
STEP02
SELECT movie_id,
AVG(rating) AS avg_rating, -- Use as alias avg_rating
COUNT(rating) AS number_rating, -- Add column for number of ratings with alias number_rating
COUNT(*) AS number_renting -- Add column for number of movie rentals with alias number_renting
FROM renting
GROUP BY movie_id;
STEP03
SELECT movie_id,
AVG(rating) AS avg_rating,
COUNT(rating) AS number_ratings,
COUNT(*) AS number_renting
FROM renting
GROUP BY movie_id
ORDER BY avg_rating DESC; -- Order by average rating in decreasing order
#---------------------------------------------------------------------
Average rating per customer
Similar to what you just did, you will now look at the average movie ratings, this time for customers. So you will obtain a table with the average rating given by each customer. Further, you will include the number of ratings and the number of movie rentals per customer. You will report these summary statistics only for customers with more than 7 movie rentals and order them in ascending order by the average rating.
SELECT customer_id, -- Report the customer_id
AVG(rating), -- Report the average rating per customer
COUNT(rating), -- Report the number of ratings per customer
COUNT(*) -- Report the number of movie rentals per customer
FROM renting
GROUP BY customer_id
HAVING COUNT(*) > 7 -- Select only customers with more than 7 movie rentals
ORDER BY AVG ASC; -- Order by the average rating in ascending order
#---------------------------------------------------------------------
VIDEO
Joining movie ratings with customer data
***LEFT JOIN
LEFT JOIN is an outer join.
Keep all rows of the left table, match with rows in the right table.
Use identi
***Giving a table a name
SELECT *
FROM customers AS c
WHERE c.customer_id = 1;
***Tables for LEFT JOIN
Left table: renting_selected
Right table: customers_selected
***LEFT JOIN example
SELECT *
FROM renting_selected AS r
LEFT JOIN customers_selected AS c
ON r.customer_id = c.customer_id;
***More than one JOIN
SELECT m.title,
c.name
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;
#---------------------------------------------------------------------
Join renting and customers
For many analyses it is necessary to add customer information to the data in the table renting.
STEP01
SELECT * -- Join renting with customers
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;;
STEP02
SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE C.country = 'Belgium'; -- Select only records from customers coming from Belgium
STEP03
SELECT AVG(r.rating) -- Average ratings of customers from Belgium
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
WHERE c.country='Belgium';
#---------------------------------------------------------------------
Aggregating revenue, rentals and active customers
The management of MovieNow wants to report key performance indicators (KPIs) for the performance of the company in 2018. They are interested in measuring the financial successes as well as user engagement. Important KPIs are, therefore, the profit coming from movie rentals, the number of movie rentals and the number of active customers.
step01
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Choose the correct join statment
ON r.movie_id = m.movie_id;
step02
SELECT
SUM(m.renting_price), -- Get the revenue from movie rentals
COUNT(*), -- Count the number of rentals
COUNT(DISTINCT r.customer_id) -- Count the number of customers
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;
step03
SELECT
SUM(m.renting_price),
COUNT(*),
COUNT(DISTINCT r.customer_id)
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id
-- Only look at movie rentals in 2018
WHERE date_renting BETWEEN '2018-01-01' AND '2018-12-31';
#---------------------------------------------------------------------
Movies and actors
You are asked to give an overview of which actors play in which movie.
SELECT m.title, -- Create a list of movie titles and actor names
a.name
FROM actsin AS ai
LEFT JOIN movies AS m
ON m.movie_id = ai.movie_id
LEFT JOIN actors AS a
ON a.actor_id = ai.actor_id;
#---------------------------------------------------------------------
Money spent per customer with sub-queries
***Subsequent SELECT statements - actresses
Query 1:
SELECT *
FROM actors
WHERE gender = 'female';
***SELECT * -- Query 1
FROM actors
WHERE gender = 'female';
Group result table of query 1 by nationality.
Report year of birth for the oldest and youngest actress in each country.
SELECT af.nationality,
MIN(af.year_of_birth),
MAX(af.year_of_birth)
FROM
(SELECT *
FROM actors
WHERE gender = 'female') AS af
GROUP BY af.nationality;
***Result subsequent SELECT statement - actresses
SELECT af.nationality,
MIN(af.year_of_birth),
MAX(af.year_of_birth)
FROM
(SELECT *
FROM actors
WHERE gender = 'female') AS af
GROUP BY af.nationality;
| nationality | min | max |
|-------------|------|------|
| Italy | 1976 | 1976 |
| Iran | 1952 | 1952 |
| USA | 1945 | 1993 |
***How much money did each customer spend?
First step: Add renting_price from movies to table renting .
SELECT r.customer_id,
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id=m.movie_id;
| customer_id | renting_price |
|-------------|---------------|
| 41 | 2.59 |
| 10 | 2.79 |
| 108 | 2.39 |
| 39 | 1.59 |
| 104 | 1.69 |
***How much money did each customer spend?
Second step:
group result table from first step
take the sum of renting_price
SELECT rm.customer_id,
SUM(rm.renting_price)
FROM
(SELECT r.customer_id,
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id=m.movie_id) AS rm
GROUP BY rm.customer_id;
***How much money did each customer spend?
| customer_id | sum |
|-------------|-------|
| 116 | 7.47 |
| 87 | 17.53 |
| 71 | 6.87 |
| 68 | 1.59 |
| 51 | 4.87 |
#---------------------------------------------------------------------
Income from movies
How much income did each movie generate? To answer this question subsequent SELECT statements can be used.
step01
SELECT m.title, -- Use a join to get the movie title and price for each movie rental
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id = m.movie_id;
step02
SELECT rm.title, -- Report the income from movie rentals for each movie
SUM(rm.renting_price) AS income_movie
FROM
(SELECT m.title,
m.renting_price
FROM renting AS r
LEFT JOIN movies AS m
ON r.movie_id=m.movie_id) AS rm
GROUP BY rm.title
ORDER BY income_movie DESC; -- Order the result by decreasing income
#---------------------------------------------------------------------
Age of actors from the USA
Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.
SELECT a.gender, -- Report for male and female actors from the USA
MIN(a.year_of_birth), -- The year of birth of the oldest actor
MAX(a.year_of_birth) -- The year of birth of the youngest actor
FROM
(SELECT gender, year_of_birth-- Use a subsequen SELECT to get all information about actors from the USA
FROM actors
WHERE nationality = 'USA') AS a -- Give the table the name a
GROUP BY a.gender;
#---------------------------------------------------------------------
video
Identify favorite actors of customer groups
***Combining SQL statements in one query
LEFT JOIN
WHERE
GROUP BY
HAVING
ORDER BY
***From renting records to customer and actor
information
Our question: Who is the favorite actor for a
certain customer group?
Join table renting with tables
customers
actsin
actors
SELECT *
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id;
***Male customers
Actors which play most often in movies watched by male customers.
SELECT a.name,
COUNT(*)
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
WHERE c.gender = 'male'
GROUP BY a.name;
***Who is the favorite actor?
Actor being watched most often.
Best average rating when being watched.
SELECT a.name,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
***Add HAVING and ORDER BY
SELECT a.name,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
WHERE c.gender = 'male'
GROUP BY a.name
HAVING AVG(r.rating) IS NOT NULL
ORDER BY avg_rating DESC, number_views DESC;
***Add HAVING and ORDER BY
| name | number_views | avg_rating |
|--------------------|--------------|------------|
| Ray Romano | 3 | 10.00 |
| Sean Bean | 2 | 10.00 |
| Leonardo DiCaprio | 3 | 9.33 |
| Christoph Waltz | 3 | 9.33 |
#---------------------------------------------------------------------
Identify favorite movies for a group of customers
Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.
step01
SELECT *
FROM renting AS r
LEFT JOIN customers AS c -- Add customer information
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m -- Add movie information
ON r.movie_id = m.movie_id;
step02
SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'; -- Select customers born in the 70s
step03
SELECT m.title,
COUNT(*), -- Report number of views per movie
AVG(r.rating) -- Report the average rating per movie
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title;
step04
SELECT m.title,
COUNT(*),
AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE c.date_of_birth BETWEEN '1970-01-01' AND '1979-12-31'
GROUP BY m.title
HAVING COUNT(*) > 1 -- Remove movies with only one rental
ORDER BY AVG DESC; -- Order with highest rating first
#---------------------------------------------------------------------
Identify favorite actors for Spain
You're now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin use ai instead.
step01
SELECT *
FROM renting AS r
LEFT JOIN customers AS c -- Augment table renting with information about customers
ON r.customer_id = c.customer_id
LEFT JOIN actsin AS ai -- Augment the table renting with the table actsin
ON r.movie_id = ai.movie_id
LEFT JOIN actors AS a -- Augment table renting with information about actors
ON ai.actor_id = a.actor_id;
step02
SELECT a.name, c.gender,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
GROUP BY a.name, c.gender -- For each actor, separately for male and female customers
HAVING AVG(r.rating) IS NOT NULL
AND COUNT(*) > 5 -- Report only actors with more than 5 movie rentals
ORDER BY avg_rating DESC, number_views DESC;
step03
SELECT a.name, c.gender,
COUNT(*) AS number_views,
AVG(r.rating) AS avg_rating
FROM renting as r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN actsin as ai
ON r.movie_id = ai.movie_id
LEFT JOIN actors as a
ON ai.actor_id = a.actor_id
WHERE c.country = 'Spain' -- Select only customers from Spain
GROUP BY a.name, c.gender
HAVING AVG(r.rating) IS NOT NULL
AND COUNT(*) > 5
ORDER BY avg_rating DESC, number_views DESC;
#---------------------------------------------------------------------
KPIs per country
In chapter 1 you were asked to provide a report about the development of the company. This time you have to prepare a similar report with KPIs for each country separately. Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
step01
SELECT *
FROM renting as r -- Augment the table renting with information about customers
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
LEFT JOIN movies AS m -- Augment the table renting with information about movies
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'; -- Select only records about rentals since the beginning of 2019
step02
SELECT
c.country, -- For each country report
COUNT(*) AS number_renting, -- The number of movie rentals
AVG(r.rating) AS average_rating, -- The average rating
SUM(renting_price) AS revenue -- The revenue from movie rentals
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE date_renting >= '2019-01-01'
GROUP BY c.country;
######################################################################
######################################################################
######################################################################
######## Data Driven Decision Making with advanced SQL queries (Module 03-027)
######################################################################
video
Nested query
***Nested query
SELECT block in WHERE or HAVING clauses
Inner query returns single or multiple values
Use result from the inner query to select speci
***The inner query
Step 1: The inner query
SELECT DISTINCT customer_id
FROM renting
WHERE rating <= 3CO
***Result in the WHERE clause
SELECT name
FROM customers
WHERE customer_id IN (28, 41, 86, 120);
***The outer query
Step 2: The outer query
SELECT name
FROM customers
WHERE customer_id IN
(SELECT DISTINCT customer_id
FROM renting
WHERE rating <= 3);
***Nested query in the HAVING clause
Step 1: The inner query
SELECT MIN(date_account_start)
FROM customers
WHERE country = 'Austria';
***Nested query in the HAVING clause
Step 2: The outer query
SELECT country, MIN(date_account_start)
FROM customers
GROUP BY country
HAVING MIN(date_account_start) <
(SELECT MIN(date_account_start)
FROM customers
WHERE country = 'Austria');
***Who are the actors in the movie Ray?
SELECT name
FROM actors
WHERE actor_id IN
(SELECT actor_id
FROM actsin
WHERE movie_id =
(SELECT movie_id
FROM movies
WHERE title='Ray'));
#---------------------------------------------------------------------
Often rented movies
Your manager wants you to make a list of movies excluding those which are hardly ever watched. This list of movies will be used for advertising. List all movies with more than 5 views using a nested query which is a powerful tool to implement selection conditions.
step01
SELECT movie_id -- Select movie IDs with more than 5 views
FROM renting
GROUP BY movie_id
HAVING COUNT(*) > 5
step02
SELECT *
FROM movies
WHERE movie_id IN -- Select movie IDs from the inner query
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(*) > 5)
#---------------------------------------------------------------------
Frequent customers
Report a list of customers who frequently rent movies on MovieNow.
SELECT *
FROM customers
WHERE customer_id IN -- Select all customers with more than 10 movie rentals
(SELECT customer_id
FROM renting
GROUP BY customer_id
HAVING COUNT(*) > 10);
#---------------------------------------------------------------------
Movies with rating above average
For the advertising campaign your manager also needs a list of popular movies with high ratings. Report a list of movies with rating above average.
STEP01
SELECT AVG(rating) -- Calculate the total average rating
FROM renting
STEP02
SELECT movie_id, -- Select movie IDs and calculate the average rating
AVG(rating)
FROM renting
GROUP BY movie_id
HAVING AVG(rating) > -- Of movies with rating above average
(SELECT AVG(rating)
FROM renting);
STEP03
SELECT title -- Report the movie titles of all movies with average rating higher than the total average
FROM movies
WHERE movie_id IN
(SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING AVG(rating) >
(SELECT AVG(rating)
FROM renting));
#---------------------------------------------------------------------
video
Correlated nested
queries
***Correlated queries
Condition in the WHERE clause of the inner query.
References some column of a table in the outer query.
***Example correlated query
Number of movie rentals more than 5
SELECT *
FROM movies as m
WHERE 5 <
(SELECT COUNT(*)
FROM renting as r
WHERE r.movie_id=m.movie_id);
***Evaluate inner query
SELECT COUNT(*)
FROM renting as r
WHERE r.movie_id = 1;
| count |
|-------|
| 8 |
***Evaluate outer query
Number of movie rentals larger than 5
SELECT *
FROM movies as m
WHERE 5 <
(SELECT COUNT(*)
FROM renting as r
WHERE r.movie_id = m.movie_id);
***Less than 5 movie rentals
Select movies with less than 5 movie rentals.
SELECT *
FROM movies as m
WHERE 5 >
(SELECT COUNT(*)
FROM renting as r
WHERE r.movie_id = m.movie_id);
#---------------------------------------------------------------------
Analyzing customer behavior
A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.
step01
-- Count movie rentals of customer 45
SELECT COUNT(*)
FROM renting AS r
WHERE customer_id=45;
step02
-- Select customers with less than 5 movie rentals
SELECT *
FROM customers as c
WHERE 5 >
(SELECT count(*)
FROM renting as r
WHERE r.customer_id = c.customer_id);
#---------------------------------------------------------------------
Customers who gave low ratings
Identify customers who were not satisfied with movies they watched on MovieNow. Report a list of customers with minimum rating smaller than 4.
step01
-- Calculate the minimum rating of customer with ID 7
SELECT MIN(rating)
FROM renting
WHERE customer_id = 7;
step02
SELECT *
FROM customers AS c
WHERE 4 > -- Select all customers with a minimum rating smaller than 4
(SELECT MIN(rating)