-
Notifications
You must be signed in to change notification settings - Fork 0
/
DataCamp.Course_023_Intermediate SQL.sql
2154 lines (1808 loc) · 68.6 KB
/
DataCamp.Course_023_Intermediate SQL.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-- COURSE 23 Intermediate SQL_023
######################################################################
######################################################################
######################################################################
######## We'll take the CASE (Module 01-023)
######################################################################
CASE statements
Contains a WHEN , THEN , and ELSE statement, nished with END
CASE WHEN x = 1 THEN 'a'
WHEN x = 2 THEN 'b'
ELSE 'c' END AS new_column
CASE WHEN
SELECT
id,
home_goal,
away_goal,
CASE WHEN home_goal > away_goal THEN 'Home Team Win'
WHEN home_goal < away_goal THEN 'Away Team Win'
ELSE 'Tie' END AS outcome
FROM match
WHERE season = '2013/2014';
#---------------------------------------------------------------------
Basic CASE statements
What is your favorite team?
The European Soccer Database contains data about 12,800 matches from 11 countries played between 2011-2015! Throughout this course, you will be shown filtered versions of the tables in this database in order to better explore their contents.
In this exercise, you will identify matches played between FC Schalke 04 and FC Bayern Munich. There are 2 teams identified in each match in the hometeam_id and awayteam_id columns, available to you in the filtered matches_germany table. ID can join to the team_api_id column in the teams_germany table, but you cannot perform a join on both at the same time.
However, you can perform this operation using a CASE statement once you've identified the team_api_id associated with each team!
Please note that you can reference the slides from the video on the TOP RIGHT of your screen!
STEP 01
SELECT
-- Select the team long name and team API id
team_long_name,
team_api_id
FROM teams_germany
-- Only include FC Schalke 04 and FC Bayern Munich
WHERE team_long_name IN ('FC Schalke 04', 'FC Bayern Munich');
STEP 02
-- Identify the home team as Bayern Munich, Schalke 04, or neither
SELECT
CASE WHEN hometeam_id = 10189 THEN 'FC Schalke 04'
WHEN hometeam_id = 9823 THEN 'FC Bayern Munich'
ELSE 'Other' END AS home_team,
COUNT(id) AS total_matches
FROM matches_germany
-- Group by the CASE statement alias
GROUP BY home_team;
#---------------------------------------------------------------------
CASE statements comparing column values
Barcelona is considered one of the strongest teams in Spain's soccer league.
In this exercise, you will be creating a list of matches in the 2011/2012 season where Barcelona was the home team. You will do this using a CASE statement that compares the values of two columns to create a new group -- wins, losses, and ties.
In 3 steps, you will build a query that identifies a match's winner, identifies the identity of the opponent, and finally filters for Barcelona as the home team. Completing a query in this order will allow you to watch your results take shape with each new piece of information.
The matches_spain table currently contains Barcelona's matches from the 2011/2012 season, and has two key columns, hometeam_id and awayteam_id, that can be joined with the teams
STEP 01
SELECT
-- Select the date of the match
date,
-- Identify home wins, losses, or ties
CASE WHEN home_goal > away_goal THEN 'Home win!'
WHEN home_goal < away_goal THEN 'Home loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain;
STEP 02
SELECT
m.date,
--Select the team long name column and call it 'opponent'
t.team_long_name AS opponent,
-- Complete the CASE statement with an alias
CASE WHEN m.home_goal > m.away_goal THEN 'Home win!'
WHEN m.home_goal < m.away_goal THEN 'Home loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain AS m
-- Left join teams_spain onto matches_spain
LEFT JOIN teams_spain AS t
ON m.awayteam_id = t.team_api_id;
STEP 03
SELECT
m.date,
t.team_long_name AS opponent,
-- Complete the CASE statement with an alias
CASE WHEN m.home_goal > m.away_goal THEN 'Barcelona win!'
WHEN m.home_goal < m.away_goal THEN 'Barcelona loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain AS m
LEFT JOIN teams_spain AS t
ON m.awayteam_id = t.team_api_id
-- Filter for Barcelona as the home team
WHERE m.hometeam_id = 8634;
#---------------------------------------------------------------------
CASE statements comparing two column values part 2
Similar to the previous exercise, you will construct a query to determine the outcome of Barcelona's matches where they played as the away team. You will learn how to combine these two queries in chapters 2 and 3.
Did their performance differ from the matches where they were the home team?
-- Select matches where Barcelona was the away team
SELECT
m.date,
t.team_long_name AS opponent,
CASE WHEN m.home_goal < m.away_goal THEN 'Barcelona win!'
WHEN m.home_goal > m.away_goal THEN 'Barcelona loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain AS m
-- Join teams_spain to matches_spain
LEFT JOIN teams_spain AS t
ON m.hometeam_id = t.team_api_id
WHERE m.awayteam_id = 8634;
#---------------------------------------------------------------------
Video:In CASE things get more complex
CASE WHEN ... AND then some (it means adding logical operators)
SELECT date, hometeam_id, awayteam_id,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!'
ELSE 'Loss or tie :(' END AS outcome
FROM match
WHERE hometeam_id = 8455 OR awayteam_id = 8455;
--->
SELECT date, hometeam_id, awayteam_id,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!'
ELSE 'Loss or tie :(' END AS outcome
FROM match;
____
NULL
SELECT date,
CASE WHEN date > '2015-01-01' THEN 'More Recently'
WHEN date < '2012-01-01' THEN 'Older'
END AS date_category
FROM match;
SELECT date,
CASE WHEN date > '2015-01-01' THEN 'More Recently'
WHEN date < '2012-01-01' THEN 'Older'
ELSE NULL END AS date_category
FROM match;
___
What are your NULL values doing?
SELECT date, season,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!'
END AS outcome
FROM match
WHERE hometeam_id = 8455 OR awayteam_id = 8455;
____
Where to place your CASE?
SELECT date, season,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!' END AS outcome
FROM match;
SELECT date, season,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!' END AS outcome
FROM match
WHERE CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!' END IS NOT NULL;
#---------------------------------------------------------------------
In CASE of rivalry
Barcelona and Real Madrid have been rival teams for more than 80 years. Matches between these two teams are given the name El Clásico (The Classic). In this exercise, you will query a list of matches played between these two rivals.
You will notice in Step 2 that when you have multiple logical conditions in a CASE statement, you may quickly end up with a large number of WHEN clauses to logically test every outcome you are interested in. It's important to make sure you don't accidentally exclude key information in your ELSE clause.
In this exercise, you will retrieve information about matches played between Barcelona (id = 8634) and Real Madrid (id = 8633). Note that the query you are provided with already identifies the Clásico matches using a filter in the WHERE clause.
STEP01
SELECT
date,
-- Identify the home team as Barcelona or Real Madrid
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS home,
-- Identify the away team as Barcelona or Real Madrid
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS away
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
STEP02
SELECT
date,
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS home,
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS away,
-- Identify all possible match outcomes
CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
ELSE 'Tie!' END AS outcome
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
#---------------------------------------------------------------------
Filtering your CASE statement
Let's generate a list of matches won by Italy's Bologna team! There are quite a few additional teams in the two tables, so a key part of generating a usable query will be using your CASE statement as a filter in the WHERE clause.
CASE statements allow you to categorize data that you're interested in -- and exclude data you're not interested in. In order to do this, you can use a CASE statement as a filter in the WHERE statement to remove output you don't want to see.
Here is how you might set that up:
SELECT *
FROM table
WHERE
CASE WHEN a > 5 THEN 'Keep'
WHEN a <= 5 THEN 'Exclude' END = 'Keep';
In essence, you can use the CASE statement as a filtering column like any other column in your database. The only difference is that you don't alias the statement in WHERE.
STEP01
-- Select team_long_name and team_api_id from team
SELECT
team_long_name,
team_api_id
FROM teams_italy
-- Filter for team name
WHERE team_long_name = 'Bologna';
STEP02
-- Select the season and date columns
SELECT
season,
date,
-- Identify when Bologna won a match
CASE WHEN hometeam_id = 9857 AND home_goal > away_goal THEN 'Bologna Win'
WHEN awayteam_id = 9857 AND away_goal > home_goal THEN 'Bologna Win'
END AS outcome
FROM matches_italy;
STEP03
-- Select the season, date, home_goal, and away_goal columns
SELECT
season,
date,
home_goal,
away_goal
FROM matches_italy
WHERE
-- Exclude games not won by Bologna
CASE WHEN hometeam_id = 9857 AND home_goal > away_goal THEN 'Bologna Win'
WHEN awayteam_id = 9857 AND away_goal > home_goal THEN 'Bologna Win'
END IS NOT NULL;
#---------------------------------------------------------------------
VIDEO: CASE WHEN with aggregate functions
In CASE you need to aggregate
CASE statements are great for
1. Categorizing data
2. Filtering data
3. Aggregating data
SELECT
season,
COUNT(CASE WHEN hometeam_id = 8650
AND home_goal > away_goal
THEN id END) AS home_wins
FROM match
GROUP BY season;
CASE WHEN with COUNT
SELECT
season,
COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
THEN id END) AS home_wins,
COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
THEN id END) AS away_wins
FROM match
GROUP BY season;
SELECT
season,
COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
THEN 54321 END) AS home_wins,
COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
THEN 'Some random text' END) AS away_wins
FROM match
GROUP BY season;
CASE WHEN with SUM
SELECT
season,
SUM(CASE WHEN hometeam_id = 8650
THEN home_goal END) AS home_goals,
SUM(CASE WHEN awayteam_id = 8650
THEN away_goal END) AS away_goals
FROM match
GROUP BY season;
The CASE is fairly AVG...
SELECT
season,
AVG(CASE WHEN hometeam_id = 8650
THEN home_goal END) AS home_goals,
AVG(CASE WHEN awayteam_id = 8650
THEN away_goal END) AS away_goals
FROM match
GROUP BY season;
A ROUNDed AVG
ROUND(3.141592653589,2)
SELECT
season,
ROUND(AVG(CASE WHEN hometeam_id = 8650
THEN home_goal END),2) AS home_goals,
ROUND(AVG(CASE WHEN awayteam_id = 8650
THEN away_goal END),2) AS away_goals
FROM match
GROUP BY season;
Percentages with CASE and AVG
SELECT
season,
AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
END) AS pct_homewins,
AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
END) AS pct_awaywins
FROM match
GROUP BY season;
SELECT
season,
ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
END),2) AS pct_homewins,
ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
END),2) AS pct_awaywins
FROM match
GROUP BY season;
#---------------------------------------------------------------------
COUNT using CASE WHEN
Do the number of soccer matches played in a given European country differ across seasons? We will use the European Soccer Database to answer this question.
You will examine the number of matches played in 3 seasons within each country listed in the database. This is much easier to explore with each season's matches in separate columns. Using the country and unfiltered match table, you will count the number of matches played in each country during the 2012/2013, 2013/2014, and 2014/2015 match seasons.
step 01
SELECT
c.name AS country,
-- Count games from the 2012/2013 season
COUNT(CASE WHEN m.season = '2012/2013'
THEN m.id ELSE NULL END) AS matches_2012_2013
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
step 02
SELECT
c.name AS country,
-- Count matches in each of the 3 seasons
COUNT(CASE WHEN m.season = '2012/2013' THEN m.id ELSE NULL END) AS matches_2012_2013,
COUNT(CASE WHEN m.season = '2013/2014' THEN m.id ELSE NULL END) AS matches_2013_2014,
COUNT(CASE WHEN m.season = '2014/2015' THEN m.id ELSE NULL END) AS matches_2014_2015
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
#---------------------------------------------------------------------
COUNT and CASE WHEN with multiple conditions
In R or Python, you have the ability to calculate a SUM of logical values (i.e., TRUE/FALSE) directly. In SQL, you have to convert these values into 1 and 0 before calculating a sum. This can be done using a CASE statement.
There's one key difference when using SUM to aggregate logical values compared to using COUNT in the previous exercise --
Your goal here is to use the country and match table to determine the total number of matches won by the home team in each country during the 2012/2013, 2013/2014, and 2014/2015 seasons.
SELECT
c.name AS country,
-- Sum the total records in each season where the home team won
SUM(CASE WHEN m.season = '2012/2013' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2012_2013,
SUM(CASE WHEN m.season = '2013/2014' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2013_2014,
SUM(CASE WHEN m.season = '2014/2015' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2014_2015
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
#---------------------------------------------------------------------
Calculating percent with CASE and AVG
CASE statements will return any value you specify in your THEN clause. This is an incredibly powerful tool for robust calculations and data manipulation when used in conjunction with an aggregate statement. One key task you can perform is using CASE inside an AVG function to calculate a percentage of information in your database.
Here's an example of how you set that up:
AVG(CASE WHEN condition_is_met THEN 1
WHEN condition_is_not_met THEN 0 END)
With this approach, it's important to accurately specify which records count as 0, otherwise your calculations may not be correct!
Your task is to examine the number of wins, losses, and ties in each country. The matches table is filtered to include all matches from the 2013/2014 and 2014/2015 seasons.
step 01
SELECT
c.name AS country,
-- Count the home wins, away wins, and ties in each country
COUNT(CASE WHEN m.home_goal > m.away_goal THEN m.id
END) AS home_wins,
COUNT(CASE WHEN m.home_goal < m.away_goal THEN m.id
END) AS away_wins,
COUNT(CASE WHEN m.home_goal = m.away_goal THEN m.id
END) AS ties
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
step 02
SELECT
c.name AS country,
-- Calculate the percentage of tied games in each season
AVG(CASE WHEN m.season='2013/2014' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2013/2014' AND m.home_goal != m.away_goal THEN 0
END) AS ties_2013_2014,
AVG(CASE WHEN m.season='2014/2015' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2014/2015' AND m.home_goal != m.away_goal THEN 0
END) AS ties_2014_2015
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
step 03
SELECT
c.name AS country,
-- Round the percentage of tied games to 2 decimal points
ROUND(AVG(CASE WHEN m.season='2013/2014' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2013/2014' AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2013_2014,
ROUND(AVG(CASE WHEN m.season='2014/2015' AND m.home_goal = m.away_goal THEN 1
WHEN m.season='2014/2015' AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2014_2015
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
######################################################################
######################################################################
######################################################################
######## Short and Simple Subqueries (Module 02-023)
######################################################################
What is a subquery?
A query nested inside another query
SELECT column
FROM (SELECT column
FROM table) AS subquery;
Can be in any part of a query
SELECT , FROM , WHERE , GROUP BY
Can return a variety of information
Scalar quantities ( 3.14159 , -2 , 0.001 )
A list ( id = (12, 25, 392, 401, 939) )
A table
Simple Subqueries
SELECT home_goal
FROM match
WHERE home_goal > (
SELECT AVG(home_goal)
FROM match);
SELECT AVG(home_goal) FROM match;
Subqueries in the WHERE clause
JOB*********************************************************
SELECT *
FROM [dbo].[ActivitesDay_Employer]
WHERE Activities > (SELECT AVG(Activities)
FROM [dbo].[ActivitesDay_Employer]);
ej:
SELECT date, hometeam_id, awayteam_id, home_goal, away_goal
FROM match
WHERE season = '2012/2013'
AND home_goal > (SELECT AVG(home_goal)
FROM match);
Subquery -IN
SELECT
team_long_name,
team_short_name AS abbr
FROM team
WHERE
team_api_id IN
(SELECT hometeam_id
FROM match
WHERE country_id = 15722);
#---------------------------------------------------------------------
Filtering using scalar subqueries
Subqueries are incredibly powerful for performing complex filters and transformations. You can filter data based on single, scalar values using a subquery in ways you cannot by using WHERE statements or joins. Subqueries can also be used for more advanced manipulation of your data set. You will likely encounter subqueries in any real-world setting that uses relational databases.
In this exercise, you will generate a list of matches where the total goals scored (for both teams in total) is more than 3 times the average for games in the matches_2013_2014 table, which includes all games played in the 2013/2014 season.
step01
-- Select the average of home + away goals, multiplied by 3
SELECT
3 * AVG(home_goal + away_goal)
FROM matches_2013_2014;
step02
SELECT
-- Select the date, home goals, and away goals scored
date,
home_goal,
away_goal
FROM matches_2013_2014
-- Filter for matches where total goals exceeds 3x the average
WHERE (home_goal + away_goal) >
(SELECT 3 * AVG(home_goal + away_goal)
FROM matches_2013_2014);
#---------------------------------------------------------------------
Filtering using a subquery with a list
Your goal in this exercise is to generate a list of teams that never played a game in their home city. Using a subquery, you will generate a list of unique hometeam_ID values from the unfiltered match table to exclude in the team table's team_api_ID column.
In addition to filtering using a single-value (scalar) subquery, you can create a list of values in a subquery to filter data based on a complex set of conditions. This type of subquery generates a one column reference list for the main query. As long as the values in your list match a column in your main query's table, you don't need to use a join -- even if the list is from a separate table.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Exclude all values from the subquery
WHERE team_api_id NOT IN
(SELECT DISTINCT hometeam_ID FROM match);
#---------------------------------------------------------------------
Filtering with more complex subquery conditions
In the previous exercise, you generated a list of teams that have no home matches listed in the soccer database using a subquery in WHERE. Let's do some further exploration in this database by creating a list of teams that scored 8 or more goals in a home match.
In order to do this, you will construct a subquery in the WHERE statement with its own filtering condition.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Filter for teams with 8 or more home goals
WHERE team_api_id IN
(SELECT hometeam_ID
FROM match
WHERE home_goal >= 8);
#---------------------------------------------------------------------
video
Subqueries in the
FROM statement
Restructure and transform your data
Transforming data from long to wide before selecting
Prefiltering data
Calculating aggregates of aggregates
Which 3 teams has the highest average of home goals scored?
1. Calculate the AVG for each team
2. Get the 3 highest of the AVG values
1st with no suquery
FROM subqueries...
step 01
SELECT
t.team_long_name AS team,
AVG(m.home_goal) AS home_avg
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id
WHERE season = '2011/2012'
GROUP BY team;
NOW::::...to main queries!
step02
FROM (SELECT
t.team_long_name AS team,
AVG(m.home_goal) AS home_avg
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id
WHERE season = '2011/2012'
GROUP BY team) AS subquery
step 03
SELECT team, home_avg
FROM (SELECT
t.team_long_name AS team,
AVG(m.home_goal) AS home_avg
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id
WHERE season = '2011/2012'
GROUP BY team) AS subquery
step04
SELECT team, home_avg
FROM (SELECT
t.team_long_name AS team,
AVG(m.home_goal) AS home_avg
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id
WHERE season = '2011/2012'
GROUP BY team) AS subquery
ORDER BY home_avg DESC
LIMIT 3;
Things to remember
You can create multiple subqueries in one FROM statement
Alias them!
Join them!
You can join a subquery to a table in FROM
Include a joining columns in both tables!
#---------------------------------------------------------------------
Joining Subqueries in FROM
The match table in the European Soccer Database does not contain country or team names. You can get this information by joining it to the country table, and use this to aggregate information, such as the number of matches played in each country.
If you're interested in filtering data from one of these tables, you can also create a subquery from one of the tables, and then join it to an existing table in the database. A subquery in FROM is an effective way of answering detailed questions that requires filtering or transforming data before including it in your final results.
Your goal in this exercise is to generate a subquery using the match table, and then join that subquery to the country table to calculate information about matches with 10 or more goals in total!
step01
SELECT
-- Select the country ID and match ID
country_id,
id
FROM match
-- Filter for matches with 10 or more goals in total
WHERE (home_goal + away_goal) >= 10;
step02
SELECT
-- Select country name and the count match IDs
c.name AS country_name,
COUNT(sub.id) AS matches
FROM country AS c
-- Inner join the subquery onto country
-- Select the country id and match id columns
INNER JOIN (SELECT country_id, id
FROM match
-- Filter the subquery by matches with 10+ goals
WHERE (home_goal + away_goal) >= 10) AS sub
ON c.id = sub.country_id
GROUP BY country_name;
#---------------------------------------------------------------------
Building on Subqueries in FROM
In the previous exercise, you found that England, Netherlands, Germany and Spain were the only countries that had matches in the database where 10 or more goals were scored overall. Let's find out some more details about those matches -- when they were played, during which seasons, and how many of the goals were home vs. away goals.
You'll notice that in this exercise, the table alias is excluded for every column selected in the main query. This is because the main query is extracting data from the subquery, which is treated as a single table.
SELECT
-- Select country, date, home, and away goals from the subquery
country,
date,
home_goal,
away_goal
FROM
-- Select country name, date, and total goals in the subquery
(SELECT c.name AS country,
m.date,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) AS total_goals
FROM match AS m
LEFT JOIN country AS c
ON m.country_id = c.id) AS subq
-- Filter by total goals scored in the main query
WHERE total_goals >= 10;
#---------------------------------------------------------------------
video
SELECTing what?
Returns a single value
Include aggregate values to compare to individual values
Used in mathematical calculations
Deviation from the average
step01
SELECT COUNT(id) FROM match;
step02
SELECT
season,
COUNT(id) AS matches,
12837 as total_matches
FROM match
GROUP BY season;
step03 skip that step
SELECT
season,
COUNT(id) AS matches,
(SELECT COUNT(id) FROM match) as total_matches
FROM match
GROUP BY season;
SELECT subqueries for mathematical calculations
step01
SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2011/2012';
-> 2,72
step02
SELECT
date,
(home_goal + away_goal) AS goals,
(home_goal + away_goal) - 2.72 AS diff
FROM match
WHERE season = '2011/2012';
step03 Subqueries in SELECT
SELECT
date,
(home_goal + away_goal) AS goals,
(home_goal + away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2011/2012') AS diff
FROM match
WHERE season = '2011/2012';
SELECT subqueries -- things to keep in mind
Need to return a SINGLE value
Will generate an error otherwise
Make sure you have all filters in rightplaces
properly filter both the main and the subquery!
SELECT
date,
(home_goal + away_goal) AS goals,
(home_goal + away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2011/2012') AS diff
FROM match
WHERE season = '2011/2012';
#---------------------------------------------------------------------
Add a subquery to the SELECT clause
Subqueries in SELECT statements generate a single value that allow you to pass an aggregate value down a data frame. This is useful for performing calculations on data within your database.
In the following exercise, you will construct a query that calculates the average number of goals per match in each country's league.
SELECT
l.name AS league,
-- Select and round the league's total goals
ROUND(AVG(m.home_goal + m.away_goal), 2) AS avg_goals,
-- Select & round the average total goals for the season
(SELECT ROUND(AVG(home_goal + away_goal), 2)
FROM match
WHERE season = '2013/2014') AS overall_avg
FROM league AS l
LEFT JOIN match AS m
ON l.country_id = m.country_id
-- Filter for the 2013/2014 season
WHERE season = '2013/2014'
GROUP BY league;
#---------------------------------------------------------------------
Subqueries in Select for Calculations
Subqueries in SELECT are a useful way to create calculated columns in a query. A subquery in SELECT can be treated as a single numeric value to use in your calculations. When writing queries in SELECT, it's important to remember that filtering the main query does not filter the subquery -- and vice versa.
In the previous exercise, you created a column to compare each league's average total goals to the overall average goals in the 2013/2014 season. In this exercise, you will add a column that directly compares these values by subtracting the overall average from the subquery.
SELECT
-- Select the league name and average goals scored
l.name AS league,
ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
-- Subtract the overall average from the league average
ROUND(AVG(home_goal + m.away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2013/2014'),2) AS diff
FROM league AS l
LEFT JOIN match AS m
ON l.country_id = m.country_id
-- Only include 2013/2014 results
WHERE season = '2013/2014'
GROUP BY l.name;
#---------------------------------------------------------------------
video
Subqueries
everywhere! And
best practices!
Can include multiple subqueries in SELECT , FROM , WHERE
SELECT
country_id,
ROUND(AVG(matches.home_goal + matches.away_goal),2) AS avg_goals,
(SELECT ROUND(AVG(home_goal + away_goal),2)
FROM match WHERE season = '2013/2014') AS overall_avg
FROM (SELECT
id,
home_goal,
away_goal,
season
FROM match
WHERE home_goal > 5) AS matches
WHERE matches.season = '2013/2014'
AND (AVG(matches.home_goal + matches.away_goal) >
(SELECT AVG(home_goal + away_goal)
FROM match WHERE season = '2013/2014')
GROUP BY country_id;
Format your queries
Line up SELECT , FROM , WHERE , and GROUP BY
#Indent
SELECT
col1,
col2,
col3
FROM table1
WHERE col1 = 2;
Annotate your queries
/* This query filters for col1 = 2
and only selects data from table1 */
/*
SELECT
col1,
col2,
col3
FROM table1
WHERE col1 = 2;
SELECT
col1,
col2,
col3
FROM table1 -- this table has 10,000 rows
WHERE col1 = 2; -- Filter WHERE value 2
Indent your subqueries!
SELECT
col1,
col2,
col3
FROM table1
WHERE col1 IN
(SELECT id
FROM table2
WHERE year = 1991);
Is that subquery necessary?
-Subqueries require computing power
How big is your database?
How big is the table you're querying from?
-Is the subquery actually necessary?
-Watch your FILTERS!!°!
SELECT
country_id,
ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
(SELECT ROUND(AVG(home_goal + away_goal),2)
FROM match WHERE season = '2013/2014') AS overall_avg
FROM match AS m
WHERE
m.season = '2013/2014'
AND (AVG(m.home_goal + m.away_goal) >
(SELECT AVG(home_goal + away_goal)
FROM match WHERE season = '2013/2014')
GROUP BY country_id;
#---------------------------------------------------------------------
ALL the Subqueries EVERYWHERE
In soccer leagues, games are played at different stages. Winning teams progress from one stage to the next, until they reach the final stage. In each stage, the stakes become higher than the previous one. The match table includes data about the different stages that each match took place in.
In this lesson, you will build a final query across 3 exercises that will contain three subqueries -- one in the SELECT clause, one in the FROM clause, and one in the WHERE clause. In the final exercise, your query will extract data examining the average goals scored in each stage of a match. Does the average number of goals scored change as the stakes get higher from one stage to the next?
SELECT
-- Select the stage and average goals for each stage
m.stage,
ROUND(AVG(home_goal + m.away_goal),2) AS avg_goals,
-- Select the average overall goals for the 2012/2013 season
ROUND((SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013'),2) AS overall
FROM match AS m
-- Filter for the 2012/2013 season
WHERE season = '2012/2013'
-- Group by stage
GROUP BY m.stage;
#---------------------------------------------------------------------
Add a subquery in FROM
In the previous exercise, you created a data set listing the average home and away goals in each match stage of the 2012/2013 match season.
In this next step, you will turn the main query into a subquery to extract a list of stages where the average home goals in a stage is higher than the overall average for home goals in a match.
SELECT
-- Select the stage and average goals from the subquery
s.stage,
ROUND(s.avg_goals,2) AS avg_goals