-
Notifications
You must be signed in to change notification settings - Fork 0
/
correlation_causation.html
4159 lines (4092 loc) · 248 KB
/
correlation_causation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
<meta charset="utf-8">
<meta name="generator" content="quarto-1.6.1">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>29 Correlation and Causation – Resampling statistics</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
ul.task-list li input[type="checkbox"] {
width: 0.8em;
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
/* CSS for syntax highlighting */
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
}
pre.numberSource { margin-left: 3em; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
/* CSS for citations */
div.csl-bib-body { }
div.csl-entry {
clear: both;
margin-bottom: 0em;
}
.hanging-indent div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}</style>
<script src="site_libs/quarto-nav/quarto-nav.js"></script>
<script src="site_libs/quarto-nav/headroom.min.js"></script>
<script src="site_libs/clipboard/clipboard.min.js"></script>
<script src="site_libs/quarto-search/autocomplete.umd.js"></script>
<script src="site_libs/quarto-search/fuse.min.js"></script>
<script src="site_libs/quarto-search/quarto-search.js"></script>
<meta name="quarto:offset" content="./">
<link href="./how_big_sample.html" rel="next">
<link href="./reliability_average.html" rel="prev">
<script src="site_libs/quarto-html/quarto.js"></script>
<script src="site_libs/quarto-html/popper.min.js"></script>
<script src="site_libs/quarto-html/tippy.umd.min.js"></script>
<script src="site_libs/quarto-html/anchor.min.js"></script>
<link href="site_libs/quarto-html/tippy.css" rel="stylesheet">
<link href="site_libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<script src="site_libs/bootstrap/bootstrap.min.js"></script>
<link href="site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
<link href="site_libs/bootstrap/bootstrap.min.css" rel="stylesheet" id="quarto-bootstrap" data-mode="light">
<script id="quarto-search-options" type="application/json">{
"location": "sidebar",
"copy-button": false,
"collapse-after": 3,
"panel-placement": "start",
"type": "textbox",
"limit": 50,
"keyboard-shortcut": [
"f",
"/",
"s"
],
"show-item-context": false,
"language": {
"search-no-results-text": "No results",
"search-matching-documents-text": "matching documents",
"search-copy-link-title": "Copy link to search",
"search-hide-matches-text": "Hide additional matches",
"search-more-match-text": "more match in this document",
"search-more-matches-text": "more matches in this document",
"search-clear-button-title": "Clear",
"search-text-placeholder": "",
"search-detached-cancel-button-title": "Cancel",
"search-submit-button-title": "Submit",
"search-label": "Search"
}
}</script>
<script type="text/javascript">
$(document).ready(function() {
$("table").addClass('lightable-paper lightable-striped lightable-hover')
});
</script>
<script src="site_libs/kePrint-0.0.1/kePrint.js"></script>
<link href="site_libs/lightable-0.0.1/lightable.css" rel="stylesheet">
<script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?features=es6"></script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script>
<script type="text/javascript">
const typesetMath = (el) => {
if (window.MathJax) {
// MathJax Typeset
window.MathJax.typeset([el]);
} else if (window.katex) {
// KaTeX Render
var mathElements = el.getElementsByClassName("math");
var macros = [];
for (var i = 0; i < mathElements.length; i++) {
var texText = mathElements[i].firstChild;
if (mathElements[i].tagName == "SPAN") {
window.katex.render(texText.data, mathElements[i], {
displayMode: mathElements[i].classList.contains('display'),
throwOnError: false,
macros: macros,
fleqn: false
});
}
}
}
}
window.Quarto = {
typesetMath
};
</script>
<link rel="stylesheet" href="style.css">
<link rel="stylesheet" href="font-awesome.min.css">
</head>
<body class="nav-sidebar floating">
<div id="quarto-search-results"></div>
<header id="quarto-header" class="headroom fixed-top">
<nav class="quarto-secondary-nav">
<div class="container-fluid d-flex">
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<i class="bi bi-layout-text-sidebar-reverse"></i>
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./correlation_causation.html"><span class="chapter-number">29</span> <span class="chapter-title">Correlation and Causation</span></a></li></ol></nav>
<a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
</a>
<button type="button" class="btn quarto-search-button" aria-label="Search" onclick="window.quartoOpenSearch();">
<i class="bi bi-search"></i>
</button>
</div>
</nav>
</header>
<!-- content -->
<div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article">
<!-- sidebar -->
<nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation floating overflow-auto">
<div class="pt-lg-2 mt-2 text-left sidebar-header">
<div class="sidebar-title mb-0 py-0">
<a href="./">Resampling statistics</a>
</div>
</div>
<div class="mt-2 flex-shrink-0 align-items-center">
<div class="sidebar-search">
<div id="quarto-search" class="" title="Search"></div>
</div>
</div>
<div class="sidebar-menu-container">
<ul class="list-unstyled mt-1">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./index.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">R version</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./preface_third.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Preface to the third edition</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./preface_second.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Preface to the second edition</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./intro.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Introduction</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./resampling_method.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">2</span> <span class="chapter-title">The resampling method</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./what_is_probability.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">3</span> <span class="chapter-title">What is probability?</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./about_technology.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span> <span class="chapter-title">Introducing R and the Jupyter notebook</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./resampling_with_code.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">5</span> <span class="chapter-title">Resampling with code</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./resampling_with_code2.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">6</span> <span class="chapter-title">More resampling with code</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./sampling_tools.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">7</span> <span class="chapter-title">Tools for samples and sampling</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./probability_theory_1a.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">8</span> <span class="chapter-title">Probability Theory, Part 1</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./probability_theory_1b.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">9</span> <span class="chapter-title">Probability Theory Part I (continued)</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./more_sampling_tools.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span> <span class="chapter-title">Two puzzles and more tools</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./probability_theory_2_compound.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">11</span> <span class="chapter-title">Probability Theory, Part 2: Compound Probability</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./probability_theory_3.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">12</span> <span class="chapter-title">Probability Theory, Part 3</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./probability_theory_4_finite.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">13</span> <span class="chapter-title">Probability Theory, Part 4: Estimating Probabilities from Finite Universes</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./sampling_variability.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">14</span> <span class="chapter-title">On Variability in Sampling</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./monte_carlo.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">15</span> <span class="chapter-title">The Procedures of Monte Carlo Simulation (and Resampling)</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./standard_scores.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">16</span> <span class="chapter-title">Ranks, Quantiles and Standard Scores</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./inference_ideas.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">17</span> <span class="chapter-title">The Basic Ideas in Statistical Inference</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./inference_intro.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">18</span> <span class="chapter-title">Introduction to Statistical Inference</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./point_estimation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">19</span> <span class="chapter-title">Point Estimation</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./framing_questions.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">20</span> <span class="chapter-title">Framing Statistical Questions</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./testing_counts_1.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">21</span> <span class="chapter-title">Hypothesis-Testing with Counted Data, Part 1</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./significance.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">22</span> <span class="chapter-title">The Concept of Statistical Significance in Testing Hypotheses</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./testing_counts_2.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">23</span> <span class="chapter-title">The Statistics of Hypothesis-Testing with Counted Data, Part 2</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./testing_measured.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">24</span> <span class="chapter-title">The Statistics of Hypothesis-Testing With Measured Data</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./testing_procedures.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">25</span> <span class="chapter-title">General Procedures for Testing Hypotheses</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./confidence_1.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">26</span> <span class="chapter-title">Confidence Intervals, Part 1: Assessing the Accuracy of Samples</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./confidence_2.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">27</span> <span class="chapter-title">Confidence Intervals, Part 2: The Two Approaches to Estimating Confidence Intervals</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./reliability_average.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">28</span> <span class="chapter-title">Some Last Words About the Reliability of Sample Averages</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./correlation_causation.html" class="sidebar-item-text sidebar-link active">
<span class="menu-text"><span class="chapter-number">29</span> <span class="chapter-title">Correlation and Causation</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./how_big_sample.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">30</span> <span class="chapter-title">How Large a Sample?</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./bayes_simulation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">31</span> <span class="chapter-title">Bayesian Analysis by Simulation</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./references.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">References</span></a>
</div>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true">
<span class="menu-text">Appendices</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./exercise_solutions.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">A</span> <span class="chapter-title">Exercise Solutions</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./technical_note.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">B</span> <span class="chapter-title">Technical Note to the Professional Reader</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./acknowlegements.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">C</span> <span class="chapter-title">Acknowledgements</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./code_topics.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">D</span> <span class="chapter-title">Code topics</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./errors_suggestions.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">E</span> <span class="chapter-title">Errors and suggestions</span></span></a>
</div>
</li>
</ul>
</li>
</ul>
</div>
</nav>
<div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div>
<!-- margin-sidebar -->
<div id="quarto-margin-sidebar" class="sidebar margin-sidebar">
<nav id="TOC" role="doc-toc" class="toc-active">
<h2 id="toc-title">Table of contents</h2>
<ul>
<li><a href="#preview" id="toc-preview" class="nav-link active" data-scroll-target="#preview"><span class="header-section-number">29.1</span> Preview</a></li>
<li><a href="#introduction-to-correlation-and-causation" id="toc-introduction-to-correlation-and-causation" class="nav-link" data-scroll-target="#introduction-to-correlation-and-causation"><span class="header-section-number">29.2</span> Introduction to correlation and causation</a></li>
<li><a href="#a-note-on-association-compared-to-testing-a-hypothesis" id="toc-a-note-on-association-compared-to-testing-a-hypothesis" class="nav-link" data-scroll-target="#a-note-on-association-compared-to-testing-a-hypothesis"><span class="header-section-number">29.3</span> A Note on Association Compared to Testing a Hypothesis</a>
<ul class="collapse">
<li><a href="#sec-athletic-iq-ranks" id="toc-sec-athletic-iq-ranks" class="nav-link" data-scroll-target="#sec-athletic-iq-ranks"><span class="header-section-number">29.3.1</span> Example: Is Athletic Ability Directly Related to Intelligence?</a></li>
<li><a href="#example-athletic-ability-and-i.q.-a-third-way" id="toc-example-athletic-ability-and-i.q.-a-third-way" class="nav-link" data-scroll-target="#example-athletic-ability-and-i.q.-a-third-way"><span class="header-section-number">29.3.2</span> Example: Athletic ability and I.Q. — a third way</a></li>
</ul></li>
<li><a href="#sec-sum-products" id="toc-sec-sum-products" class="nav-link" data-scroll-target="#sec-sum-products"><span class="header-section-number">29.4</span> Correlation with sum of products</a>
<ul class="collapse">
<li><a href="#sec-ath-iq-sop" id="toc-sec-ath-iq-sop" class="nav-link" data-scroll-target="#sec-ath-iq-sop"><span class="header-section-number">29.4.1</span> Example: sum of products correlation of athletic and IQ scores</a></li>
<li><a href="#example-correlation-between-adherence-to-medication-regime-and-change-in-cholesterol" id="toc-example-correlation-between-adherence-to-medication-regime-and-change-in-cholesterol" class="nav-link" data-scroll-target="#example-correlation-between-adherence-to-medication-regime-and-change-in-cholesterol"><span class="header-section-number">29.4.2</span> Example: Correlation Between Adherence to Medication Regime and Change in Cholesterol</a></li>
</ul></li>
<li><a href="#sec-correlation-coefficient" id="toc-sec-correlation-coefficient" class="nav-link" data-scroll-target="#sec-correlation-coefficient"><span class="header-section-number">29.5</span> The correlation coefficient</a>
<ul class="collapse">
<li><a href="#correlations-are-symmetrical" id="toc-correlations-are-symmetrical" class="nav-link" data-scroll-target="#correlations-are-symmetrical"><span class="header-section-number">29.5.1</span> Correlations are symmetrical</a></li>
<li><a href="#the-correlation-coefficient-in" id="toc-the-correlation-coefficient-in" class="nav-link" data-scroll-target="#the-correlation-coefficient-in"><span class="header-section-number">29.5.2</span> The correlation coefficient in R</a></li>
<li><a href="#test-linear-association-with-the-correlation-coefficient" id="toc-test-linear-association-with-the-correlation-coefficient" class="nav-link" data-scroll-target="#test-linear-association-with-the-correlation-coefficient"><span class="header-section-number">29.5.3</span> Test linear association with the correlation coefficient</a></li>
</ul></li>
<li><a href="#sec-counted-association" id="toc-sec-counted-association" class="nav-link" data-scroll-target="#sec-counted-association"><span class="header-section-number">29.6</span> Testing for a relationship between counted-data variables</a>
<ul class="collapse">
<li><a href="#example-drinking-beer-and-being-in-favor-of-selling-beer" id="toc-example-drinking-beer-and-being-in-favor-of-selling-beer" class="nav-link" data-scroll-target="#example-drinking-beer-and-being-in-favor-of-selling-beer"><span class="header-section-number">29.6.1</span> Example: Drinking Beer And Being In Favor of Selling Beer</a></li>
<li><a href="#example-do-athletes-really-have-slumps" id="toc-example-do-athletes-really-have-slumps" class="nav-link" data-scroll-target="#example-do-athletes-really-have-slumps"><span class="header-section-number">29.6.2</span> Example: do athletes really have “slumps”?</a></li>
</ul></li>
<li><a href="#exercises" id="toc-exercises" class="nav-link" data-scroll-target="#exercises"><span class="header-section-number">29.7</span> Exercises</a>
<ul class="collapse">
<li><a href="#sec-exr-voter-participation" id="toc-sec-exr-voter-participation" class="nav-link" data-scroll-target="#sec-exr-voter-participation"><span class="header-section-number">29.7.1</span> Exercise: voter participation</a></li>
<li><a href="#sec-exr-runs-strikeouts" id="toc-sec-exr-runs-strikeouts" class="nav-link" data-scroll-target="#sec-exr-runs-strikeouts"><span class="header-section-number">29.7.2</span> Exercise: association of runs and strikeouts</a></li>
<li><a href="#sec-exr-runs-strikeouts-r" id="toc-sec-exr-runs-strikeouts-r" class="nav-link" data-scroll-target="#sec-exr-runs-strikeouts-r"><span class="header-section-number">29.7.3</span> Exercise: runs, strikeouts, correlation coefficient</a></li>
<li><a href="#sec-exr-money-exchange" id="toc-sec-exr-money-exchange" class="nav-link" data-scroll-target="#sec-exr-money-exchange"><span class="header-section-number">29.7.4</span> Exercise: money and exchange rate</a></li>
</ul></li>
</ul>
</nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content">
<header id="title-block-header" class="quarto-title-block default">
<div class="quarto-title">
<h1 class="title"><span id="sec-correlation-causation" class="quarto-section-identifier"><span class="chapter-number">29</span> <span class="chapter-title">Correlation and Causation</span></span></h1>
</div>
<div class="quarto-title-meta">
</div>
</header>
<section id="preview" class="level2" data-number="29.1">
<h2 data-number="29.1" class="anchored" data-anchor-id="preview"><span class="header-section-number">29.1</span> Preview</h2>
<p>The correlation (speaking in a loose way for now) between two variables measures the strength of the relationship between them. A positive “linear” correlation between two variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> implies that high values of <span class="math inline">\(x\)</span> are associated with high values of <span class="math inline">\(y\)</span>, and that low values of <span class="math inline">\(x\)</span> are associated with low values of <span class="math inline">\(y\)</span>. A negative correlation implies the opposite; high values of <span class="math inline">\(x\)</span> are associated with <em>low</em> values of <span class="math inline">\(y\)</span>. By definition a “correlation coefficient” (<a href="#sec-correlation-coefficient" class="quarto-xref"><span>Section 29.5</span></a>) close to zero indicates little or no linear relationship between two variables; correlation coefficients close to 1 and -1 denote a strong positive or negative relationship. We will start by using a simpler measure of correlation than the correlation coefficient, however.</p>
<p>One way to measure correlation with the resampling method is to rank both variables from highest to lowest, and investigate how often in randomly-generated samples the rankings of the two variables are as close to each other as the rankings in the observed variables. A better approach, because it uses more of the quantitative information contained in the data — though it requires more computation — is to multiply the values for the corresponding pairs of values for the two variables, and compare the sum of the resulting products to the analogous sum for randomly-generated pairs of the observed variable values (<a href="#sec-sum-products" class="quarto-xref"><span>Section 29.4</span></a>). The last section of the chapter (<a href="#sec-counted-association" class="quarto-xref"><span>Section 29.6</span></a>) shows how the strength of a relationship can be determined when the data are counted, rather than measured. First comes some discussion of the philosophical issues involved in correlation and causation.</p>
</section>
<section id="introduction-to-correlation-and-causation" class="level2" data-number="29.2">
<h2 data-number="29.2" class="anchored" data-anchor-id="introduction-to-correlation-and-causation"><span class="header-section-number">29.2</span> Introduction to correlation and causation</h2>
<p>The questions in examples <a href="probability_theory_3.html#sec-birthday-problem" class="quarto-xref"><span>Section 12.1</span></a> to <a href="probability_theory_4_finite.html#sec-fifteen-bridge" class="quarto-xref"><span>Section 13.3.3</span></a> have been stated in the following form: Does the independent variable (say, irradiation; or type of pig ration) have an effect upon the dependent variable (say, sex of fruit flies; or weight gain of pigs)? This is another way to state the following question: Is there a <em>causal relationship</em> between the independent variable(s) and the dependent variable? (“Independent” or “control” is the name we give to the variable(s) the researcher believes is (are) responsible for changes in the other variable, which we call the “dependent” or “response” variable.)</p>
<p>A causal relationship cannot be defined perfectly neatly. Even an experiment does not determine perfectly whether a relationship deserves to be called “causal” because, among other reasons, the independent variable may not be clear-cut. For example, even if cigarette smoking experimentally produces cancer in rats, it might be the paper and not the tobacco that causes the cancer. Or consider the fabled gentlemen who got experimentally drunk on bourbon and soda on Monday night, scotch and soda on Tuesday night, and brandy and soda on Wednesday night — and stayed sober on Thursday night by drinking nothing. With a vast inductive leap of scientific imagination, they treated their experience as an empirical demonstration that soda, the common element each evening, was the cause of the inebriated state they had experienced. Notice that their deduction was perfectly sound, given only the recent evidence they had. Other knowledge of the world is necessary to set them straight. That is, even in a controlled experiment there is often no way except subject-matter knowledge to avoid erroneous conclusions about causality. Nothing except substantive knowledge or scientific intuition would have led them to the recognition that it is the alcohol rather than the soda that made them drunk, <em>as long as they always took soda with their drinks</em>. And no statistical procedure can suggest to them that they ought to experiment with the presence and absence of soda. If this is true for an experiment, it must also be true for an uncontrolled study.</p>
<p>Here are some tests that a relationship usually must pass to be called causal. That is, a working definition of a particular causal relationship is expressed in a statement that has these important characteristics:</p>
<ol type="1">
<li><p>It is an association that is strong enough so that the observer believes it to have a predictive (explanatory) power great enough to be scientifically useful or interesting. For example, he is not likely to say that wearing glasses causes (or is a cause of) auto accidents if the observed <em>correlation coefficient</em> is .07<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>, even if the sample is large enough to make the correlation statistically convincing. In other words, unimportant relationships are not likely to be labeled causal.</p>
<p>Various observers may well differ in judging whether or not an association is strong enough to be important and therefore “causal.” And the particular field in which the observer works may affect this judgment. This is an indication that whether or not a relationship is dubbed “causal” involves a good deal of human judgment and is subject to dispute.</p></li>
<li><p>The “side conditions” must be sufficiently <em>few</em> and sufficiently observable so that the relationship will apply under a wide enough range of conditions to be considered useful or interesting. In other words, <em>the relationship must not require too many “if”s, “and”s, and “but”s in order to hold</em>. For example, one might say that an increase in income caused an increase in the birth rate if this relationship were observed everywhere. But, if the relationship were found to hold only in developed countries, among the educated classes, and among the higher-income groups, then it would be less likely to be called “causal” — even if the correlation were extremely high once the specified conditions had been met. A similar example can be made of the relationship between income and happiness.</p></li>
<li><p>For a relationship to be called “causal,” there should be sound reason to believe that, even if the control variable were not the “real” cause (and it never is), other relevant “hidden” and “real” cause variables must also change <em>consistently</em> with changes in the control variables. That is, a variable being manipulated may reasonably be called “causal” if the real variable for which it is believed to be a proxy must always be tied intimately to it. (Between two variables, v and w, v may be said to be the “more real” cause and <em>w</em> a “spurious” cause, if <em>v</em> and <em>w</em> require the same side conditions, except that <em>v</em> does not require <em>w</em> as a side condition.) This third criterion (non-spuriousness) is of particular importance to policy makers. The difference between it and the previous criterion for side conditions is that a plenitude of very restrictive side conditions may take the relationship out of the class of causal relationships, <em>even though the effects of the side conditions are known</em>. This criterion of nonspuriousness concerns variables that are as yet <em>unknown and unevaluated</em> but that have a <em>possible</em> ability to <em>upset</em> the observed association.</p>
<p>Examples of spurious relationships and hidden-third-factor causation are commonplace. For a single example, toy sales rise in December. There is no danger in saying that December causes an increase in toy sales, even though it is “really” Christmas that causes the increase, because Christmas and December practically always accompany each other.</p>
<p>Belief that the relationship is not spurious is increased if <em>many</em> likely variables have been investigated and none removes the relationship. This is further demonstration that the test of whether or not an association should be called “causal” cannot be a logical one; there is no way that one can express in symbolic logic the fact that many other variables have been tried without changing the relationship in question.</p></li>
<li><p>The more tightly a relationship is bound into (that is, deduced from, compatible with, and logically connected to) a general framework of theory, the stronger is its claim to be called “causal.” For an economics example, observed positive relationships between the interest rate and business investment and between profits and investment are more likely to be called “causal” than is the relationship between liquid assets and investment. This is so because the first two statements can be deduced from classical price theory, whereas the third statement cannot. Connection to a theoretical framework provides support for belief that the side conditions necessary for the statement to hold true are not restrictive and that the likelihood of spurious correlation is not great; because a statement is logically connected to the rest of the system, the statement tends to stand or fall as the rest of the system stands or falls. And, because the rest of the system of economic theory has, over a long period of time and in a wide variety of tests, been shown to have predictive power, a statement connected with it is cloaked in this mantle.</p></li>
</ol>
<p>The social sciences other than economics do not have such well-developed bodies of deductive theory, and therefore this criterion of causality does not weigh as heavily in sociology, for instance, as in economics. Rather, the other social sciences seem to substitute a weaker and more general criterion, that is, whether or not the statement of the relationship is accompanied by other statements that seem to “explain” the “mechanism” by which the relationship operates. Consider, for example, the relationship between the phases of the moon and the suicide rate. The reason that sociologists do not call it causal is that there are no auxiliary propositions that explain the relationship and describe an operative mechanism. On the other hand, the relationship between broken homes and youth crime is often referred to as “causal,” presumably because a large body of psychological theory serves to explain why a child raised without one or the other parent, or in the presence of parental strife, should not adjust readily.</p>
<p>Furthermore, one can never decide with perfect certainty whether in any <em>given</em> situation one variable “causes” a particular change in another variable. At best, given your particular purposes in investigating a phenomena, you may be safe in judging that very likely there is causal influence.</p>
<p>In brief, it is correct to say (as it is so often said) that correlation does not prove causation — if we add the word “completely” to make it “correlation does not <em>completely</em> prove causation.” On the other hand, causation can <em>never</em> be “proven” <em>completely</em> by correlation <em>or any other</em> tool or set of tools, including experimentation. The best we can do is make informed judgments about whether to call a relationship causal.</p>
<p>It is clear, however, that in any situation where we are interested in the possibility of causation, we must <em>at least</em> know whether there is a relationship (correlation) between the variables of interest; the existence of a relationship is necessary for a relationship to be judged causal even if it is not sufficient to receive the causal label. And in other situations where we are not even interested in causality, but rather simply want to predict events or understand the structure of a system, we may be interested in the existence of relationships quite apart from questions about causations. Therefore our next set of problems deals with the probability of there being a relationship between two measured variables, variables that can take on any values (say, the values on a test of athletic scores) rather than just two values (say, whether or not there has been irradiation.)<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<p>Another way to think about such problems is to ask whether two variables are <em>independent</em> of each other — that is, whether you know anything about the value of one variable if you know the value of the other in a particular case — or whether they are not independent but rather are related.</p>
</section>
<section id="a-note-on-association-compared-to-testing-a-hypothesis" class="level2" data-number="29.3">
<h2 data-number="29.3" class="anchored" data-anchor-id="a-note-on-association-compared-to-testing-a-hypothesis"><span class="header-section-number">29.3</span> A Note on Association Compared to Testing a Hypothesis</h2>
<p>Problems in which we investigate a) whether there is an <em>association</em>, versus b) whether there is a <em>difference</em> between just two groups, often look very similar, especially when the data constitute a 2-by-2 table. There is this important difference between the two types of analysis, however: Questions about <em>association</em> refer to <em>variables</em> — say weight and age — and it never makes sense to ask whether there is a difference between variables (except when asking whether they measure the same quantity). Questions about <em>similarity or difference</em> refer to <em>groups of individuals</em>, and in such a situation it does make sense to ask whether or not two groups are observably different from each other.</p>
<section id="sec-athletic-iq-ranks" class="level3" data-number="29.3.1">
<h3 data-number="29.3.1" class="anchored" data-anchor-id="sec-athletic-iq-ranks"><span class="header-section-number">29.3.1</span> Example: Is Athletic Ability Directly Related to Intelligence?</h3>
<p>A more specific version of our question: <strong>is there correlation between the two variables or are they independent?</strong></p>
<p>A scientist often wants to know whether or not two characteristics go together, that is, whether or not they are correlated (that is, related or associated). For example, do young adults with high athletic ability tend to also have high I.Q.s?</p>
<p>Hypothetical physical-education scores of a group of ten high-school boys are shown in <a href="#tbl-physical-mental" class="quarto-xref">Table <span>29.1</span></a>, ordered from high to low, along with the I.Q. score for each boy. The ranks for each student’s athletic and I.Q. scores are then shown in the third and fourth columns:</p>
<div class="cell" data-layout-align="center">
<div class="cell-output-display">
<div id="tbl-physical-mental" class="lightable-paper lightable-striped lightable-hover quarto-float quarto-figure quarto-figure-center anchored" data-quarto-postprocess="true" style="font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-physical-mental-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table 29.1: Hypothetical athletic and I.Q. scores for high school boys
</figcaption>
<div aria-describedby="tbl-physical-mental-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="lightable-paper lightable-striped lightable-hover caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th style="text-align: right;" data-quarto-table-cell-role="th">Athletic Score</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">I.Q. Score</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">Athletic Rank</th>
<th style="text-align: right;" data-quarto-table-cell-role="th">I.Q.Rank</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">97</td>
<td style="text-align: right;">114</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">3</td>
</tr>
<tr class="even">
<td style="text-align: right;">94</td>
<td style="text-align: right;">120</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">1</td>
</tr>
<tr class="odd">
<td style="text-align: right;">93</td>
<td style="text-align: right;">107</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">7</td>
</tr>
<tr class="even">
<td style="text-align: right;">90</td>
<td style="text-align: right;">113</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">4</td>
</tr>
<tr class="odd">
<td style="text-align: right;">87</td>
<td style="text-align: right;">118</td>
<td style="text-align: right;">5</td>
<td style="text-align: right;">2</td>
</tr>
<tr class="even">
<td style="text-align: right;">86</td>
<td style="text-align: right;">101</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">8</td>
</tr>
<tr class="odd">
<td style="text-align: right;">86</td>
<td style="text-align: right;">109</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">6</td>
</tr>
<tr class="even">
<td style="text-align: right;">85</td>
<td style="text-align: right;">110</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">5</td>
</tr>
<tr class="odd">
<td style="text-align: right;">81</td>
<td style="text-align: right;">100</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">9</td>
</tr>
<tr class="even">
<td style="text-align: right;">76</td>
<td style="text-align: right;">99</td>
<td style="text-align: right;">10</td>
<td style="text-align: right;">10</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</div>
</div>
<p><a href="#fig-ath-iq-scatter" class="quarto-xref">Figure <span>29.1</span></a> is a <em>scatterplot</em> with “Athletic Score” on the x-axis and “I.Q. Score” on the y-axis. Each point on the plot corresponds to one row of <a href="#tbl-physical-mental" class="quarto-xref">Table <span>29.1</span></a> (and therefore one boy); in particular each point is at the <span class="math inline">\(x\)</span>, <span class="math inline">\(y\)</span> coordinate given by the values in “Athletic Score” and “I.Q. Score”. For example the point for the first boy is at position x=97, y=114.</p>
<div class="cell" data-layout-align="center">
<div class="cell-output-display">
<div id="fig-ath-iq-scatter" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ath-iq-scatter-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="correlation_causation_files/figure-html/fig-ath-iq-scatter-1.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:70.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-ath-iq-scatter-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure 29.1: Scatter plot of I.Q. Score as a function of Athletic Score
</figcaption>
</figure>
</div>
</div>
</div>
<p>We want to know whether a high score on athletic ability tends to be found along with a high I.Q. score more often than would be expected by chance. Therefore, our strategy is to see how often high scores on <em>both</em> variables are found by chance. We do this by disassociating the two variables and making two separate and independent universes, one composed of the athletic scores and another of the I.Q. scores. Then we draw pairs of observations from the two universes at random, and compare the experimental patterns that occur by chance to what actually is observed to occur in the world.</p>
<p>The first testing scheme we shall use is similar to our first approach to the pig rations — splitting the results into just “highs” and “lows.” We take ten cards, one of each denomination from “ace” to “10,” shuffle, and deal five cards to correspond to the first five athletic ranks. The face values then correspond to the I.Q. ranks. Under the benchmark hypothesis the athletic ranks will not be associated with the I.Q. ranks. Add the face values in the first five cards in each trial; the first hand includes 2, 4, 5, 6, and 9, so the sum is 26. Record, shuffle, and repeat perhaps ten times. Then compare the random results to the sum of the observed ranks of the five top athletes, which equals 17.</p>
<p>The following steps describe a slightly different procedure than that just described, because this one may be easier to understand:</p>
<ul>
<li><strong>Step 1.</strong> Convert the athletic and I.Q. scores to ranks. Then constitute a universe of spades, “ace” to “10,” to correspond to the athletic ranks, and a universe of hearts, “ace” to “10,” to correspond to the IQ ranks.</li>
<li><strong>Step 2.</strong> Deal out the well-shuffled cards into pairs, each pair with an athletic score and an I.Q. score.</li>
<li><strong>Step 3.</strong> Locate the cards with the top five athletic ranks, and add the I.Q. rank scores on their paired cards. Compare this sum to the observed sum of 17. If 17 or less, indicate “yes,” otherwise “no.” (Why do we use “17 or less” rather than “less than 17”? Because we are asking the probability of a score <em>this low or lower</em>.)</li>
<li><strong>Step 4.</strong> Repeat steps 2 and 3 ten times.</li>
<li><strong>Step 5.</strong> Calculate the proportion “yes.” This estimates the probability sought.</li>
</ul>
<p>In <a href="#tbl-ability-trials" class="quarto-xref">Table <span>29.2</span></a> we see that the observed sum (17) is lower than the sum of the top 5 ranks in all but one (shown in bold) of the ten random trials (trial 5), which suggests that there is a good chance (9 in 10) that the five best athletes will not have I.Q. scores that high by chance. But it might be well to deal some more to get a more reliable average. We add thirty hands, and thirty-nine of the total forty hands exceed the observed rank value, so the probability that the observed correlation of athletic and I.Q. scores would occur by chance is about .025. In other words, if there is no real association between the variables, the probability that the top 5 ranks would sum to a number this low or lower is only 1 in 40, and it therefore seems reasonable to believe that high athletic ability tends to accompany a high I.Q.</p>
<div id="tbl-ability-trials" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-ability-trials-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table 29.2: 40 random trials of the athletic / IQ problem
</figcaption>
<div aria-describedby="tbl-ability-trials-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<thead>
<tr class="header">
<th>Trial</th>
<th>Sum of IQ Ranks</th>
<th><= observed (17)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>26</td>
<td>No</td>
</tr>
<tr class="even">
<td>2</td>
<td>23</td>
<td>No</td>
</tr>
<tr class="odd">
<td>3</td>
<td>22</td>
<td>No</td>
</tr>
<tr class="even">
<td>4</td>
<td>37</td>
<td>No</td>
</tr>
<tr class="odd">
<td><strong>5</strong></td>
<td><strong>16</strong></td>
<td><strong>Yes</strong></td>
</tr>
<tr class="even">
<td>6</td>
<td>22</td>
<td>No</td>
</tr>
<tr class="odd">
<td>7</td>
<td>22</td>
<td>No</td>
</tr>
<tr class="even">
<td>8</td>
<td>28</td>
<td>No</td>
</tr>
<tr class="odd">
<td>9</td>
<td>38</td>
<td>No</td>
</tr>
<tr class="even">
<td>10</td>
<td>22</td>
<td>No</td>
</tr>
<tr class="odd">
<td>11</td>
<td>35</td>
<td>No</td>
</tr>
<tr class="even">
<td>12</td>
<td>36</td>
<td>No</td>
</tr>
<tr class="odd">
<td>13</td>
<td>31</td>
<td>No</td>
</tr>
<tr class="even">
<td>14</td>
<td>29</td>
<td>No</td>
</tr>
<tr class="odd">
<td>15</td>
<td>32</td>
<td>No</td>
</tr>
<tr class="even">
<td>16</td>
<td>25</td>
<td>No</td>
</tr>
<tr class="odd">
<td>17</td>
<td>25</td>
<td>No</td>
</tr>
<tr class="even">
<td>18</td>
<td>29</td>
<td>No</td>
</tr>
<tr class="odd">
<td>19</td>
<td>25</td>
<td>No</td>
</tr>
<tr class="even">
<td>20</td>
<td>22</td>
<td>No</td>
</tr>
<tr class="odd">
<td>21</td>
<td>30</td>
<td>No</td>
</tr>
<tr class="even">
<td>22</td>
<td>31</td>
<td>No</td>
</tr>
<tr class="odd">
<td>23</td>
<td>35</td>
<td>No</td>
</tr>
<tr class="even">
<td>24</td>
<td>25</td>
<td>No</td>
</tr>
<tr class="odd">
<td>25</td>
<td>33</td>
<td>No</td>
</tr>
<tr class="even">
<td>26</td>
<td>30</td>
<td>No</td>
</tr>
<tr class="odd">
<td>27</td>
<td>24</td>
<td>No</td>
</tr>
<tr class="even">
<td>28</td>
<td>29</td>
<td>No</td>
</tr>
<tr class="odd">
<td>29</td>
<td>30</td>
<td>No</td>
</tr>
<tr class="even">
<td>30</td>
<td>31</td>
<td>No</td>
</tr>
<tr class="odd">
<td>31</td>
<td>30</td>
<td>No</td>
</tr>
<tr class="even">
<td>32</td>
<td>21</td>
<td>No</td>
</tr>
<tr class="odd">
<td>33</td>
<td>25</td>
<td>No</td>
</tr>
<tr class="even">
<td>34</td>
<td>19</td>
<td>No</td>
</tr>
<tr class="odd">
<td>35</td>
<td>29</td>
<td>No</td>
</tr>
<tr class="even">
<td>36</td>
<td>23</td>
<td>No</td>
</tr>
<tr class="odd">
<td>37</td>
<td>23</td>
<td>No</td>
</tr>
<tr class="even">
<td>38</td>
<td>34</td>
<td>No</td>
</tr>
<tr class="odd">
<td>39</td>
<td>23</td>
<td>No</td>
</tr>
<tr class="even">
<td>40</td>
<td>26</td>
<td>No</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<p>In fact we can apply an even simpler procedure to get the same result, by reasoning about the individual trial.</p>
<p>One trial in our procedure is:</p>
<ul>
<li><strong>Step 2.</strong> Deal out the well-shuffled cards into pairs, each pair with an athletic score and an I.Q. score.</li>
<li><strong>Step 3.</strong> Locate the cards with the top five athletic ranks, and add the I.Q. rank scores on their paired cards. Compare this sum to the observed sum of 17. If 17 or less, indicate “yes,” otherwise “no.” (Why do we use “17 or less” rather than “less than 17”? Because we are asking the probability of a score <em>this low or lower</em>.)</li>
</ul>
<p>Now consider the 5 IQ rank cards. In the procedure above, we found these by <em>first</em> pairing the athletic ranks and the IQ ranks, <em>then</em> selecting the IQ ranks corresponding to the top 5 athletic ranks. A little thought may persuade you, that by doing this, we have have a <em>random selection of 5 IQ ranks</em>. We got that random selection by pairing, selecting on athletic rank — but the initial pairing and selection will do nothing other than giving us one particular set of randomly chosen 5 IQ rank cards. So we can simplify our procedure even further by missing out the pairing and selecting by rank steps; we can just shuffle the IQ rank cards and deal out 5 to be our randomly selected IQ ranks.</p>
<div id="nte-athlete_iq" class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note 29.1: Notebook: Athletic ability and IQ
</div>
</div>
<div class="callout-body-container callout-body">
<div class="nb-links">
<p><a class="notebook-link" href="notebooks/athlete_iq.Rmd">Download notebook</a> <a class="interact-button" href="./interact/lab/index.html?path=athlete_iq.ipynb">Interact</a></p>
</div>
</div>
</div>
<div class="nb-start" name="athlete_iq" title="Athletic ability and IQ">
</div>
<p>To simulate this problem in R, we first create a vector containing the I.Q. rankings of the top 5 students in athletics. The <code>sum</code> of these I.Q. rankings constitutes the observed result to be tested against randomly-drawn samples. We observe that the actual I.Q. rankings of the top five athletes sums to 17. The more frequently that the sum of 5 randomly-generated rankings (out of 10) is as low as this observed number, the higher is the probability that there is no relationship between athletic performance and I.Q. based on these data.</p>
<p>First we record the 1 through 10 into vector <code>iq_ranks</code>. Then we shuffle the numbers so the rankings are in a random order. Then select the first 5 of these numbers and put them in another vector, <code>top_5</code>, and <code>sum</code> them, putting the result in <code>top_5_sum</code>. We repeat this procedure <code>N = 10000</code> times, recording each result in a scorekeeping vector: <code>z</code>. Graphing <code>z</code>, we get a histogram that shows us how often our randomly assigned sums are equal to or below 17.</p>
<div class="cell" data-layout-align="center">
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Number of repeats.</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>n_trials <span class="ot"><-</span> <span class="dv">10000</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># The IQ ranks, ready for shuffling.</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>iq_ranks <span class="ot"><-</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">10</span> <span class="co"># 1 through 10.</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Scorekeeping array.</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>results <span class="ot"><-</span> <span class="fu">numeric</span>(n_trials)</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span>n_trials) {</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="co"># Shuffle the ranks.</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> shuffled <span class="ot"><-</span> <span class="fu">sample</span>(iq_ranks)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="co"># Take the first 5 ranks.</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> top_5 <span class="ot"><-</span> shuffled[<span class="dv">1</span><span class="sc">:</span><span class="dv">5</span>]</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="co"># Sum those ranks.</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> top_5_sum <span class="ot"><-</span> <span class="fu">sum</span>(top_5)</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="co"># Keep track of the result of each trial.</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> results[i] <span class="ot"><-</span> top_5_sum</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="co"># End the experiment, go back and repeat.</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Produce a histogram of trial results.</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Make the bins be the integers from 10 through 45.</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="fu">hist</span>(results, <span class="at">breaks=</span><span class="dv">10</span><span class="sc">:</span><span class="dv">45</span>, <span class="at">main=</span><span class="st">'Sums of 5 ranks selected at random'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="correlation_causation_files/figure-html/unnamed-chunk-4-3.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:70.0%"></p>
</figure>
</div>
</div>
</div>
<p>We see that in only about 2 % of the trials did random selection of ranks produce a total of 17 or lower. R can calculate this for us directly:</p>
<div class="cell" data-layout-align="center">
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Determine how many trials produced sums of ranks <= 17 by chance.</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>k <span class="ot"><-</span> <span class="fu">sum</span>(results <span class="sc"><=</span> <span class="dv">17</span>)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="co"># The proportion.</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>kk <span class="ot"><-</span> k <span class="sc">/</span> n_trials</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Show the result.</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>kk</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.016</code></pre>
</div>
</div>
<div class="nb-end">
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
End of notebook: Athletic ability and IQ
</div>
</div>
<div class="callout-body-container callout-body">
<p><code>athlete_iq</code> starts at <a href="#nte-athlete_iq" class="quarto-xref">Note <span>29.3</span></a>.</p>
</div>
</div>
<!---
End of notebook
-->
<p>Why do we sum the ranks of the first <em>five</em> athletes rather than taking the sum of the top three, say? Indeed, we could have looked at the top three, two, four, or even six or seven. The first reason for splitting the group in half is that an even split uses the available information more fully, and therefore we obtain greater efficiency. (I cannot prove this formally here, but perhaps it makes intuitive sense to you.) A second reason is that getting into the habit of always looking at an even split reduces the chances that you will pick and choose in such a manner as to fool yourself. For example, if the I.Q. ranks of the top five athletes were 3, 2, 1, 10, and 9, we would be deceiving ourselves if, after looking the data over, we drew the line between athletes 3 and 4. (More generally, choosing an appropriate measure before examining the data will help you avoid fooling yourself in such matters.)</p>
<p>A simpler but less efficient approach to this same problem is to classify the top-half athletes by whether or not they were also in the top half of the I.Q. scores. Of the first five athletes actually observed, <em>four</em> were in the top five I.Q. scores. We can then shuffle five black and five red cards and see how often four or more (that is, four or five) blacks come up with the first five cards. The proportion of times that four or more blacks occurs in the trial is the probability that an association as strong as that observed might occur by chance even if there is no association. <a href="#tbl-top-rank-counts" class="quarto-xref">Table <span>29.3</span></a> shows a proportion of five trials out of twenty.</p>
<div id="tbl-top-rank-counts" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-top-rank-counts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table 29.3: Results of 20 random trials of the top-5 rank counts
</figcaption>
<div aria-describedby="tbl-top-rank-counts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<thead>
<tr class="header">
<th>Trial</th>
<th>Score</th>
<th>Yes or No</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>4</td>
<td>Yes</td>
</tr>
<tr class="even">
<td>2</td>
<td>2</td>
<td>No</td>
</tr>
<tr class="odd">
<td>3</td>