-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
1248 lines (952 loc) · 53.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>Intro to Probability & Statistics</title>
<meta charset="utf-8" />
<meta name="date" content="2022-02-27" />
<script src="probability-statistics_files/header-attrs/header-attrs.js"></script>
<link rel="stylesheet" href="xaringan-themer.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
# Intro to Probability & Statistics
## DSCI 501 - Winter 2022
### 2022-02-27
---
## Objectives:
- Understand probability spaces
- Understand the principle of counting
- Learn how to find permutations
- Learn how to calculate combinations
- Know how to compute independent probability
- Know how to compute conditional probability
- Understand the different types of variables
- Understand Bayes rule and how it can be useful
- Understand what expectation is
- Understand the importance of variance & standard variation
- Know the main types of distributions
- Understand the Central Limit Theorem
- Know how to calculate a confidence interval
- Know how to perform a hypothesis test
- Understand the significance of a p-value
---
class: inverse center middle
## Probability
---
## Probability Spaces
* Probability space = aka Probability triple `\((\Omega, \mathcal{F}, \mathcal{P})\)`, a mathematical construct that provides a formal model of a random process that contains the *sample space*, *event space* and the *probability function*
* Sample space `\(\Omega\)` = A set of outcomes of an experiment - also can be referenced as *S*
.center[**Example 1:** Toss a coin. *S* = {*H,T*}. |*S*| = 2]
.center[**Example 2:** Toss *N* coins in succession.An outcome is the resulting sequence of *N* H's and T's.
*S* = `\(\underbrace{(H,T) \times\cdots\times(H,T)}_{\mathrm{Ntimes}}\)`
|*S*| = `\(2^N\)`]
* Event = a subset of the sample space
.center[**Example 3:** In the experiment where *N* = 4, there are 16 outcomes. The event 'there are at least two consecutive H's' is the set:
*E = {HHHH, HHHT, HHTH, HHTT, THHH, THHT, TTHH}*]
---
## Probability Spaces (cont.)
* Discrete space = a *countable* set. Can be *finite* or *infinite*.
.center[**Example 4:** Flip a coin repeatedly until you get heads. Outcome is the number of tosses:
*S* = {1,2,3 `\(\dots\)`} = `\(Z^+\)`(set of positive integers)
*S* is a *discrete infinite* sample space]
* Continuous space = *infinite* number of items in the space
.center[**Example 5:** Given we have a 2x2 sample space, what are the odds we throw a dart and it lands on (1,1)?
![](images/cont-space.png)
There are an infinite number of spaces the dart could land on. The odds of it landing on *exactly* (1,1) is zero (see [Zeno's Paradox](https://blogs.unimelb.edu.au/sciencecommunication/2017/10/22/zenos-paradox-the-puzzle-that-keeps-on-giving/)). However, the continuous space for the dart hitting anywhere on the board is : `\(S = \{(x,y)\} \in R \times R: x \times y < 4\)`]
---
## Probability Functions
* Probability function `\(\mathcal{P}\)` = the set function that returns an event's probability, a real number between 0 and 1. Therefore `\(P: \mathcal{P}(S) \to [0,1]\)` where `\(\mathcal{P}(S)\)` is the *power set* of *S*; the set of all subsets of *S*.
* Probability Mass Function (PMF) = probability distribution; assigns a measure of likelihood, probability, of the outcome to each outcome in the sample space:
.center[ `\(P:S \rightarrow [0,1]\)` where [0,1] denotes the *closed interval* `\(\{x \in R:0\le x \le 1\}\)` *and* that `\(\sum\limits_{s \in S} P(s)=1\)`
**Example 6:** For a coin toss, `\(P(0)=P(1)=\frac{1}{2}\)`. Likewise, for a fair die `\(P(i)=\frac{1}{6}\)` for all `\(i \in \{1,2,3,4,5,6\}\)`. These are instances of *uniform distributions* on finite sample space, in which
`\(P(s) = \frac{1}{|S|}\)` for all `\(s \in S\)`]
* More specifically, `\(P\)` can be extended from the PMF to:
.center[ `\(P(E)=\sum\limits_{s \in S} P(s)\)`
]
* In the case where the PMF is *uniform*:
.center[This simplifies to: `\(P(E)=\frac{|E|}{|S|}\)`
]
---
## Probability Axioms
* *De Morgan's Law* in the theory of sets is that the complement of the union of two sets is the intersection of the complements. Or vice versa: the complement of the intersection is the union of the complements:
.center[
`\((A \cup B)^c = A^cB^c\)`
`\((AB)^c=A^c\cup B^c\)`
De Morgan's law is easy to verify using the *Karnaugh map* for two events:
![](images/karn-map.png)
]
---
## Probability Axioms (cont.)
1. `\(P(E) \ge0\)` for every event *E*
2. `\(P(\Omega)=1\)`
3. Events are *mutually exclusive*, or *disjoint*, when the probability of them co-occurring is 0. Therefore, if `\(E_i \cap E_j = \emptyset\)` whenever `\(i \neq j\)`
.center[
`\(P(E \cup F)=P(E)+P(F)\)` - The sum can be finite or infinite.
]
If Axioms 1-3 are satisfied then `\(P\)` has other intuitive properties (note, A, B, E used interchangeably):
a. For any event `\(E, P(\bar{E})=1-P(E)\)`. That is because `\(E\)` and `\(\bar{E}\)` are mutually exclusive events and `\(\Omega=E \cap \bar{E}\)`. So Axioms 2 & 3 yield `\(P(E)+P(\bar{E}) = P(E \cap \bar{E}) = P(\Omega) =1\)`
b. `\(P(\emptyset)=0.\)` That is because `\(\emptyset\)` and `\(\Omega\)` are complements of each other, so by Property a and Axiom 2, `\(P(\emptyset)=1-P(\Omega)=0\)`
c. If `\(A\subset B\)` then `\(P(A) \le P(B)\)`. That is because `\(B=A \cup (A^cB)\)` and `\(A\)` & `\(A^cB\)` are mutually exclusive, and `\(P(A^cB)\ge 0,\)` so `\(P(A)\le P(A)+P(A^cB)=P(A\cup (A^cB))=P(B)\)`
d. Events `\(E_1\)` and `\(E_2\)` in a sample space are *independent* if `\(E_2\)` is no more or less likely to occur when `\(E_1\)` occurs than when `\(E_1\)` does not occur: `\(P(E_1 \cap E_2)=P(E_1) \cdot P(E_2)\)`
e. `\(P(A \cup B) = P(A) + P(B) - P(AB)\)`
---
## Probability Examples - Set of Dice
.pull-left[
**Example 1:** Let `\(S=\{1,2,3,4,5,6\} \times \{1,2,3,4,5,6\}\)`, with uniform distribution, representing the usual roll of 2 dice. Let `\(E_1\)` be the event 'the first die is odd', and let `\(E_2\)` be the event 'the second die is even'. Then:
.center[
`\(E_1=\{1,3,5\}\times\{1,2,3,4,5,6\}\)`
`\(E_2=\{1,2,3,4,5,6\}\times\{2,4,6\}\)`
`\(E_1\cap E_2 = \{1,2,3\}\times \{4,5,6\}\)`
So, `\(|E_1|=|E_2|=18, |E_1 \cap E_2| = 9\)`, thus:
`\(P(E_1 \cap E_2)=9/36=1/4=1/2 \cdot 1/2 = P(E_1)\cdot P(E_2)\)`
So, the events are *independent*
]
]
.pull-right[
```
# simulate N rolls of a pair of standard
# dice and find the number of times
# each roll occurs.
from pylab import *
def dice(N):
d={} #Python dictionary
for j in range(N):
#need tuples to index dicts
x=tuple(choice(arange(1,7),2))
if x in d:
d[x]+=1
else:
d[x]=1
for j in arange(1,7):
for k in arange(1,7):
y=(j,k)
print(y,':',d[y])
```
]
---
## Probability Examples - Beans
.pull-left[
**Example 2**: A jar contains 100 navy beans, 100 pinto, and 100 black beans. You reach in the jar and pull out 3 beans. What is the probability that the 3 beans are all different?
- First define the sample space: `\(S=\{1,...,300\}\times \{1,...,300\}\times \{1,...300\}\)`
This is equal to `\(300^3 = 2.7 \times 10^7\)`
- Event `\(|E|\)` is all the triples (*i,j,k*) of beans with different colors. The first bean has 300 possible values, the second 200, and the third 100. Therefore `\(P\)` with replacement:
`\(|E|=300\times200\times100 =6 \times10^6\)`
`\(P(E)=|E|/S = 6/27 = 0.2222\)`
- Sampling *without* replacement changes the sample space:
`\(S = 300 \times 299 \times 298\)`
`\(P(E) = 6 \times 10^6 / (300 \times 299 \times 298) = 0.2245\)`
]
.pull-right[
```
# A bin contains b beans of each of
# three colors (0,1,2).
# Pull out 3 beans (with or without
# replacement). What is the probability
# that all three are different?
def bean_sim(b,numtrials,repl=True):
# the jar
beans=[0]*b+[1]*b+[2]*b
count=0
for j in range(numtrials):
sample=choice(beans,3,
replace=repl)
if (0 in sample) and
(1 in sample) and
(2 in sample):
count+=1
return count/numtrials
```
]
---
class: inverse center middle
## Counting
---
## Counting Principle
> Rule of Product: If you have one event, A, and another event, B, then there are A x B ways to perform both actions.
.center[*Events:* `\(a_1, a_2, \dots, a_n\)`
*Total number of ways:* `\(a_1 \times a_2 \times \cdots a_n\)`]
- Example: I am packing for my vacation. I've selected 6 tops, 3 bottoms, 2 hats, and 2 pairs of shoes. How many different outfits can I make?
![](images/rop.png)
--
.center[**= 72 different outfits**]
---
## Tree Diagrams
- A useful way to study the probabilities of events relating to experiments that take place in stages and for which we are given the probabilities for the outcomes at each stage.
*Example:* Dining at a restaurant.How many possible choices do you have for a complete meal?
</br>
![](images/menu-tree.png)
---
## Representing Tree Diagram of Probabilities
Suppose the restaurant in the previous example wants to find the probability a customer chooses meat given they know the percentages of other choices?
--
.pull-left[
| Symbol | Meaning |
|:------------|:------------------------------------------|
| `$$\Omega$$` | The sample space - the set of all possible outcomes|
| `$$\omega$$` | An outcome. A sample point in the sample space |
| `$$\omega_j$$` | Finite number of outcomes in the sample space |
| `$$m(\omega_j)$$`| The *distribution function*. Each outcome `\(\omega_j\)` is a assigned a nonnegative number `\(m(\omega_j)\)` in such a way that `\(m(\omega_1)+m(\omega_2)+\cdots + m(\omega_j) = 1\)` |
]
--
.pull-right[
![](images/menu-prob.png)
]
---
## Representing Tree Diagram of Probabilities
Suppose the restaurant in the previous example wants to find the probability a customer chooses meat given they know the percentages of other choices?
.pull-left[
| Symbol | Meaning |
|:------------|:------------------------------------------|
| `\(\Omega\)` | The sample space - the set of all possible outcomes|
| `\(\omega\)` | An outcome. A sample point in the sample space |
| `\(\omega_j\)` | Finite number of outcomes in the sample space |
| `\(m(\omega_j)\)`| The *distribution function*. Each outcome `\(\omega_j\)` is a assigned a nonnegative number `\(m(\omega_j)\)` in such a way that `\(m(\omega_1)+m(\omega_2)+\cdots + m(\omega_j) = 1\)` |
]
.pull-right[
![](images/menu-prob2.png)]
--
#### The probability a customer chooses meat is `\(m(\omega_1)+m(\omega_4)=.46\)`
---
## Permutations
> How many sequences of *k* elements of *{1,...,n}* have all *k* elements distinct?
For example, with `\(n=5,k=3\)`, then (4,5,1) is such a sequence, but (1,5,1) is not. We have *n* choices for the first component of the sequence, and for each such choice, `\(n-1\)` choices for the second, etc. So, by the above principle, the number of such sequences is:
.center[
`\(n \cdot (n-1) \cdots (n-k+1) = \frac{n!}{(n-k)!}\)`
]
This is the number of *k-permutations* of an n-element set. If `\(n = k\)`, then `\((n-k)!=0!=1\)`, so the number of n-permutations of `\(\{1,...,n\} = n!\)`. In this case, we just call them *permutations* of `\(\{1,...,n\}\)`. If `\(n < k\)`, then the formula does not make sense - there are no *k-permutations* of `\(\{1,...,n\}\)`.
**Example.** What is the number of sequences of 2 distinct cards drawn from a deck of cards (sampling without replacement)?
.center[
`\(52 \times 51 = \frac{52!}{50!}\)`
]
---
## A Birthday Example
.pull-left[
*Problem:* Given there are 30 people in a room, what is the probability that there are 2 people with the same birthday?
* 365 = possible birthdays for each person(ignore leap year)
* *k* = number of people in a room
To solve this, let's order the people from 1 to *k*. First, let's find the probability that all 30 people have different birthdays. The number of possible birthdays for the first person is 365. What about for the second person?
* For each possible birthday in the sequence, there is one less available date so, for #2 = 364
* Person #3 = 363, etc.
* What is the sample space?
`\(|S| = 365^{30}, |E| = 365 \times \cdots 336\)`
]
.pull-right[
Therefore, the probability that all 30 people have different birthdays is
.center[
`\(P(E) = \frac{|E|}{|S|}= \frac{365}{365}\cdot \frac{364}{365}\cdots \frac{336}{365} = \frac{365 \cdot 364 \cdots (365-k+1)}{365^k}\)`
]
Then, the probability that there are two people with the same birthday is just the probability of the *complementary* event,
.center[
`\(1 -P(E) \approx 0.71\)`
]
<!-- This problem can also be solved the an *exponential approximation*... -->
]
---
## Binomial Coefficients
> The number of k-element subsets of an n-element set (when `\(0 \le k \le n\)`), "n choose k";
.center[
`\(\binom{n}{k}=\frac{n!}{(n-k)!k!}\)` ]
This formula works when n=0 or when k=0, k=n because 0! = 1.
*Properties:*
* `\(\binom{n}{0}=\binom{n}{n}\)` for all `\(n \ge 0\)`
* `\(\binom{n}{k}=\binom{n}{n-k}\)` for all `\(0 \le k \le n\)`
* `\(\binom{n}{k}=\binom{n-1}{k-1}+\binom{n-1}{k}\)` for all `\(1 \le k \le n\)`
**Example.** Select 5 cards from a 52-card deck. What is the probability of getting a flush (all cards of the same suit)?
* Given that the probability distribution is uniform, the probability of any event *E* is given by: `\(\frac{|E|}{\binom{52}{5}}\)`
* For the flush, each suit has 13 cards, so that there are `\(\binom{13}{5}\)` flushes in that suit, thus:
.center[
`\(|E| = 4 \cdot \binom{13}{5}\)`
]
---
## Binomial Theorem
**Problem.** `\((x+y)^4 = xxxx+xxxy+xxyx+\cdots+yyyx+yyyy\)`
The right-hand is the sum of all sequences of 4 x's and y's. The coefficient of `\(x^{4-k} y^k\)` is the number of such sequences containing exactly `\(k\)` `\(y\)`'s Thus:
.center[
`\((x+y)^4 = \binom{4}{0}x^4 + \binom{4}{1}x^3y + \binom{4}{2}x^2y^2 + \binom{4}{3}xy^3 + \binom{4}{4}y^4\)`
`\(= x^4 +4x^3y +6x^2y^2+4xy^3+y^4\)`
Therefore, for any `\(n \ge 0\)`
`\((x+y)^n = \sum\limits_{k=0}^n \binom{n}{k}x^k y^{n-k}\)`
]
**Example.** How many ways are there to arrange the letters of the word MISSISSIPPI?
- There are 11 letters and if all were different the answer would be 11!. But, in this problem, many of those 11! arrangements represent the same string so we need to treat the problem as a sequence of choices for each of the distinct letters. 4/11 positions to hold I, then for each choice 4/7 remaining positions to hold S, then 2/3 remaining positions to hold P. After that there is only 1 position remaining for M:
.center[
`\(\binom{11}{4} \binom{7}{4} \binom{3}{2} = \frac{11!}{4!7!} \cdot \frac{7!}{4!3!} \cdot \frac{3!}{2!1!} = \frac{11!}{4!4!2!}\)`
]
---
class: inverse center middle
## Discrete Random Variables, PMFs & CDFs
---
## Discrete-type Random Variables
> *Random variable X* assigns a number to the outcome of an experiment.
**Example.** Coin Tosses - toss a fair coin 20 times:
- `\(X_1\)` = the number of heads tossed
- `\(X_2\)` = excess heads over tails (number of heads - number of tails)
- `\(X_3\)` = length of the longest run of consecutive heads or tails
- `\(X_4\)` = number of tosses until heads comes up
These are all *random variables*.
A random variable is said to be *discrete-type* if there is a finite set `\(u_1,...,u_n\)` or a countable infinite set `\(u_1,u_2,...\)` such that
.center[
`\(P\{X \in \{u_1,u_2,...\}\} = 1\)`
]
The *probability mass function* (pmf) for a discrete-type random variable `\(X, p_X\)`, is defined by `\(p_X(u)=P\{X=u\}\)` The above formula can be written as
.center[
`\(\sum\limits_{i}p_X(u_i) = 1\)`
]
---
## Discrete Random Variables- Examples
**Example.** Let *S* be the sum of the numbers showing on a pair of fair dice when they are rolled. Find the pmf of *S*.
**Solution:** The underlying sample space is `\(\Omega = \{(i,j):1\le i \le 6, 1 \le j \le 6\}\)`, and it has 36 possible outcomes, each having a probability of `\(1/36\)`. The smallest value of *S* is 2 thus `\(\{S=2\}=\{(1,1)\}\)`. That is, there is only one outcome resulting in *S* = 2, so `\(p_S(2)=1/36\)`. Similarly, `\(\{S=3\}=\{(1,2),(2,1)\}\)`, so `\(p_S(3)=2/36\)`. And, `\(\{S=4\}=\{(1,3),(2,2),(3,1)\}\)`, so `\(p_S(4)=3/36\)` and so forth. The pmf of *S* is shown below.
</br>
.center[
![](images/pmf-dice.png)
]
---
## Independent Random Variables
> Two random variables are said to be *independent* if for all `\(a,b \in \mathbf{R}\)`,
.center[
`\(\{s \in S: X_1(s)=a\}, \{s \in S:X_2(s) =b\}\)`
]
are independent event. Recall this means that for all *a* and *b*,
.center[
`\(P((X_1=a) \land (X_2=b)) = P(X_1=a) \cdot P(X_2=b).\)`
]
This implies that for any *sets* *A* and *B* of values,
.center[
`\(\{s \in S: X_1(s) \in A\},\{s \in S:X_2(s) \in B\}\)`
]
are also independent events.
.footnote[
The symbol `\(\land\)` can be read as "and".
]
---
## Independent Random Variables - Example
We naturally assume that the random variables that we denoted in the previous dice example, representing the individual outcomes of each of the two dice ( `\(Y_{2,1}\)` and `\(Y_{2,2}\)` ), are independent. We can compute the PMF of the sum `\(Y_2=Y_{2,1} + Y_{2,2}\)`. For example, the event `\(Y_2=8\)` is the disjoint union of the events:
.center[
`\((Y_{2,1}=2) \land (Y_{2,1}=6)\)`
`\((Y_{2,1}=3) \land (Y_{2,1}=5)\)`
`\((Y_{2,1}=4) \land (Y_{2,1}=4)\)`
`\((Y_{2,1}=5) \land (Y_{2,1}=3)\)`
`\((Y_{2,1}=6) \land (Y_{2,1}=2)\)`
]
Independence implies that each of these events has probability `\(\frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36}\)`, so `\(P_{Y_2}(8)= \frac{5}{36}\)`
> If we know the distributions of two random variables, and we know that they are independent, then we can compute the distribution of their sum (and, likewise, their product, or any other operation performed on them).
Note: `\(Y_2\)` and `\(Y_{2,2}\)` are *not* independent. The sum of the two dice really does depend on what shows up on the second die! We can verify this formally: `\(P(Y_2=12)=\frac{1}{36}\)`. `\(P(Y_{2,2}=5)=\frac{1}{6}\)`, BUT `\(P((Y_2=12)\land(Y_{2,2}=5))=0\neq 1/36 \cdot 1/6\)`
---
## Cumulative Distribution Function (CDF)
> The *cumulative distribution* function of a random variable `\(X\)`, denote `\(F_X\)`, is a function
.center[
`\(F_X:\mathbf{R} \rightarrow [0,1]\)`
]
defined by:
.center[
`\(F_X(a) = P(X \le a)\)`.
]
For discrete random variables, we can compute the CDF as: `\(F_X(a)=\sum\limits_{b\le a}P_X(b)\)`
For discrete random variables, the CDF is a step function as shown below for the sum of two dice.
.center[![](images/cdf-dice.png)]
---
## Simulating the roll of 2 dice - code
.pull-left[
```
# Compute probabilites for the roll of 2
# dice. The value returned is an array of
# the cardinality of the frequencies of
# the events X=i for i from 2 through 12.
from pylab import *
def dice_frequencies():
#generate the sample space
s=[(i,j) for i in range(1,7)
for j in range(1,7)]
t=[sum(pair) for pair in s]
h=histogram(t,bins=arange(1.5,13,1))
return h[0]
```
]
.pull-right[
```
# Use the cumulative distribution
# function to simulate 100,000 samples
# from this distribution. Then obtain a
# histogram of the relative frequencies
# and plot them side by side with the
# theoretically derived probabilities.
def display():
y=dice_frequencies()/36
z=cumsum(y)
stem(arange(2,13),y,label='computed',
linefmt='r-',markerfmt='k.')
samples=[2+searchsorted(z,random())
for j in range(100000)]
h=histogram(samples,
bins=arange(1.5,13,1))
stem(arange(2.2,13.2,1),h[0]/100000,
label='simulated',linefmt='k-',
markerfmt='r.')
title('PMF of sum of two dice')
legend(loc='upper right')
show()
```
]
---
class: inverse center middle
## Important Discrete Random Variables
---
## Bernoulli Distribution
> A random variable `\(X\)` is said to have the *Bernoulli distribution* with parameter `\(p\)`, where `\(0 \le p \le1\)`, if `\(P(X = 1) = p\)` and `\(P(X = 0) = 1-p\)`
Note: There is not one Bernoulli distribution - you get a different PMF for every value of a parameter `\(p\)`.
**Example.** Flip a biased coin with heads probability `\(P\)`, and set `\(X=1\)` if the results is heads and `\(X=0\)` otherwise.
#### Bernoulli Trials
The principal use of the binomial coefficients will occur in the study of one of the important chance processes called *Bernoulli trials*. Bernoulli trials process is a sequence of `\(n\)` chance experiments such that:
.pull-left[
1. Each experiment has two possible outcomes, which we may call success and failure
2. The probability `\(p\)` of success on each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. The probability `\(p\)` of failure is given by `\(q = 1 − p\)`.
]
.pull-right[
.center[![](images/bernoulli-tree.png)]
]
---
## Binomial Distribution
> Suppose `\(n\)` independent Bernoulli trials are conducted, each resulting in a 1 with probability `\(p\)` and a 0 with probability `\(1-p\)`. Let X denote the total number of 1s occurring in the `\(n\)` trials. Any particular outcome with `\(k\)` ones and `\(n-k\)` zeros has the probability `\(p^k(1-p)^{n-k}\)`. Since there are `\(\binom{n}{k}\)` possible outcomes, we find the pmf of `\(X\)` is
.center[
`\(P_X(k)=\binom{n}{k}p^k(1-p)^{n-k}\)`
]
**Example.** The number of heads on `\(n\)` successive tosses of a biased coin with heads probability `\(p\)`.
The distribution of `\(X\)` is called the *binomial distribution* with parameters `\(n\)` and `\(p\)`
.center[![](images/binom-pmf.png)]
---
## Geometric Distribution
> There is a single parameter `\(0 \le p \le 1\)`. If `\(k\)` is a positive integer, then
.center[
`\(P_X(k)=(1-p)^{k-1}p\)`
]
**Example.** `\(X\)` is the number of flips of a biased coin with heads probability `\(p\)` until heads appears. For instance, `\(X=1\)` if the first toss is heads, `\(X=3\)` if the first two are tails and the third is heads. Note that this has a nonzero value at all positive integers `\(k\)`.
.center[
![](/images/geom-dist.png)
]
---
## Poisson Distribution
> Let `\(\lambda > 0\)`. We set
.center[
`\(P(X=k)= \frac{\lambda^k}{k!} \cdot e^{-\lambda}\)` for `\(k \ge 0\)`
]
By this definition, the first 4 terms of this PMF are: `\(p(0)=e^{-\lambda}\)`, `\(p(1)=\lambda e^{-\lambda}\)`, `\(p(2)=\frac{\lambda^2}{2} e^{-\lambda}\)`, `\(p(3)=\frac{\lambda^3}{6} e^{-\lambda}\)`.
The Poisson distribution arises frequently in practice, because it is a good approximation for a binomial distribution with parameters `\(n\)` and `\(p\)`, when n is very large, `\(p\)` is very small, and `\(\lambda = np\)`. Some examples in which such binomial distributions occur are:
* Incoming phone calls in a fixed time interval: `\(n\)` is the number of people with cell phones within the access region of one base station, and `\(p\)` is the probability that a given such person will make a call within the next minute.
* Misspelled words in a document: `\(n\)` is the number of words in a document and `\(p\)` is the probability that a given word is misspelled.
**Example.** Suppose 500 shots are fired at a target. It is known that only about one in one hundred shots hit the bulls-eye. What is the probability of getting 3 or more bulls-eyes? To solve this, we compute the complementary (0,1,or 2), In the Poisson approximation this is given by: `\(e^{-\lambda}(1+\lambda+\lambda^2/2) = 18.5e^{-5} \approx0.125\)`. There for the probability of at least 3 bulls-eyes is about 0.875
---
## Expected value of a random variable
> `\(E(X)\)` denotes the *expected value*, or *expectation*, or *mean* of the random variable X. The definition is just the weighted average of the values of X, where the weights are the probabilities:
.center[
`\(E(X)=\sum\limits_{a} a \cdot P_X(a)\)`
]
####Simple Examples
.pull-left[
1. A single die: Here, `\(P_X(i) = 1/6\)` for `\(i=1,...,6\)`.
So,
`\(E(X) = \sum\limits_{i=1}^6 i \cdot \frac{1}{6} = \frac{1}{6} \sum\limits_{i=1}^6 i=\frac{1}{6} \cdot 21 = 3.5\)`
2. Sum of two dice: Looking at the PMF of the sum of two dice, it is apparent that `\(E(X)=7\)`. For an `\(i\)` between 0 and 5, `\(P(X=7-i)=P(X=7+i)\)`. In other words, 2 has the same probability as 12, 3 as 11, etc. So,
]
.pull-right[.center[![](images/exp-symm.png)]]
---
## Linearity of Expectation
The prior example demonstrated why the symmetry in the graph makes the value 7, without having to use any calculated values for the probability. There is a simpler way to calculate the expectation of the two dice based on linearity of expectation.
If `\(X,Y\)` are random variables defined on the sample space, then:
.pull-left[
![](images/lin-exp.png)
In a like manner, if `\(X_!,...,X_n\)` are all defined on `\(S\)` then `\(E(X_1+...+X_n)=E(X_1)_...+E(X_n)\)`
]
.pull-right[
So if we were to redo the previous example with 2 dice:
`\(E(X) = E(X_1 + X_2)\)`
`\(=E(X_1)+E(X_2)\)`
`\(= 3.5 +3.5 = 7\)`
Additionally, if `\(c \in \mathbf{R}\)` is constant, then
`\(E(cX) = c \cdot E(X)\)`
]
---
## Conditional Probability
> Let `\(E\)` and `\(F\)` be events in a probability space. `\(P(E|F)\)` denotes the conditional probability of `\(E\)` conditiond on `\(F\)`. Meaning, what proportion of the times that `\(F\)` occurs, does `\(E\)` also occur? This is defined by:
.center[
`\(P(E|F)=\frac{P(E\cap F)}{P(F)}\)`
]
.left-column[
![](images/cond-prob.png)
]
.right-column[
**Example 1**. The figure to the left illustrates the definition. Imagine the dots represent outcomes each with probability `\(\frac{1}{2}\)`. Then `\(P(E)=\frac{1}{2}, P(E)=\frac{5}{12}\)`, and `\(P(E\cap F) = \frac{3}{5}\)`. Thus, `\(P(E|F)=\frac{3}{5}\)` and `\(P(F|E)=\frac{1}{2}\)`
**Example 2**. Consider the roll of a die. Let `\(F\)` be the event 'the number showing is even', and the `\(E_1,E_2\)` the events 'the number showing is 1' and 'the number showing is 2', respectively. Here, `\(F=\{2,4,6\}, E_1=\{1\},E_2=\{2\}, E_1 \cap F = \emptyset\)` and `\(E_2 \cap F = \{2\}\)`. Then, `\(P(E_1|F)=0\)`, while `\(P(E_2|F)=\frac{1}{3}\)`.
]
---
### More examples of conditional probability
.pull-left[
**Example 3**. Given 2 coins, what is the probability that both coins coins are heads, given one of them is heads? That is, what is `\(P(E|F)\)` where `\(E\)` is the event 'both coins are heads' and `\(F\)` is the event 'at least one coin is heads'. Our sample space is the 4 equally likely outcomes HH,HT,TH,TT. As sets, `\(F=\{HH,HT,TH\}\)` so `\(P(F)=\frac{3}{4}\)`. In this case, `\(E \cap F = E = \{H,H\}\)`, so `\(P(E)=\frac{1}{4}\)`. Thus,
.center[
`\(P(E|F)=\frac{P(E \cap F)}{P(F)} = \frac{1}{4}/\frac{3}{4} = \frac{1}{3}\)`
]
**Example 4**. *Chain Rule for conditional probability*. If we consider the intersection of 3 events and apply the definition twice, we get
.center[
`\(P(E_1 \cap E_2 \cap E_3) = P(E_1|E_2 \cap E_3) \cdot P(E_2 \cap E_3)\)`
`\(= P(E_1|E_2 \cap E_3) \cdot P(E_2|E_3) \cdot P(E_3)\)`,
]
and similarly for any number of events.
]
.pull-right[
**Example 5**. *Connection with independence*. If `\(E,F\)` are independent, then `\(P(E \cap F)=P(E) \cdot P(F)\)`, so it follows that `\(P(E|F)=P(E)\)`. Conversely, if `\(P(E|F)=P(E)\)`, it follows from the definition that `\(P(E \cap F)=P(E) \cdot P(F)\)`. So we can characterize independence this way in terms of conditional probability. Because of the symmetry in the problem, this also implies `\(P(F|E)=P(F)\)`.
**Example 6**. We have 2 urns. Urn 1 has 2 black & 3 white balls. Urn has 1 black & 1 white ball.The tree below visualizes the sample spaces and the probabilities of the spaces.
.center[![](images/cond-tree.png)]
]
---
class: inverse center middle
## Bayes Theorem
---
## Bayes Probability
> The definition implies,
> .center[
`\(P(E|F) \cdot P(F) = P(E \cap F) = P(F|E) \cdot P(E)\)`
]
> which can be rewritten as,
> .center[
`\(P(E|F) = \frac{P(F|E)}{P(F)} \cdot P(E)\)`
]
This is what is known as *Bayes probability*. It can be thought of simply as 'given the outcome of the second stage of a 2-stage experiment, find the probability for an outcome at the first stage'. Returning to the urn tree diagram, we were able to find the probabilities for a ball of a given color, given the urn chosen. The tree below is a *reverse tree diagram* calculating the *inverse probability* that a particular urn was chosen, given the color of the ball. Bayes probabilities can be obtained by simply constructing the tree in reverse order.
.pull-left[![](images/bayes-tree.png)]
.pull-right[
From the forward tree, we find that the probability of a black ball is, `\(\frac{1}{2} \cdot \frac{2}{5} + \frac{1}{2} \cdot \frac{1}{2} = \frac{9}{20}\)`. From there we can compute the probability of the second level by simple division: `\(\frac{9}{20} \cdot x = \frac{1}{5} \therefore x=4/9=P(I|B)\)`
]
---
## Bayes' Formula
Suppose we have a set of events `\(H_1,H_2,...,H_m\)` that are pairwise disjoint and such that the sample space `\(\Omega\)` satisfies this equation,
.center[
`\(\Omega = H_1 \cup H_2 \cup \cdots \cup H_m\)`
]
We call these events *hypotheses*. We also have an event *E* that gives us some information about which hypothesis is correct - *evidence*.
Before we receive the evidence, then, we have a set of *prior probabilities* `\(P(H_1), P(H_2),...,P(H_m)\)` for the hypotheses. If we know the correct hypothesis, we know the probability for the evidence. That is,m we know `\(P(E|H_i)\)`, for all `\(i\)`. We want to find the probabilities for the hypotheses given the evvidence. That is, we want to find the conditional probabilities `\(P(H_i|E)\)`. These probabilities are called the *posterior probabilities*. To find these probabilities, we write them in the form,
.center[
`\(P(H_i|E)=\frac{P(H_i \cap E)}{P(E)}\)` where `\(P(H_i \cap E) = P(H_i)P(E|H_i)\)` and `\(P(E)=P(H_1 \cap E)+...+P(H_m \cap E)\)`
]
Using these formula we yield *Bayes' formula*:
.center[
`\(P(H_i|E)=\frac{P(H_i)P(E|H_i)}{\sum_{k=1}^m P(H_k)P(E|H_k)}\)`
]
---
## Bayes' - A Medical Application
A doctor is trying to decide if a patient has one of three diseases `\(d_1,d_2,\)` or `\(d_3\)`. Two tests are carried out, each of which results in a positive (+) or a negative (-) outcome. There are four possible test patterns ++, +-, -+, and --. National records have indicated that, for 10,000 people having one of the three diseases, the distribution of disease and test results are as follows:
.center[
![](images/dis-dat.png)
]
.pull-left[
From this data we can estimate the *prior probabilities* for each of the diseases and, given a disease, the probability of a particular test outcome. For example, the prior of `\(d_1 = 3215/10000 = .3215\)`. Then, the probability of the test result +-, given `\(d_1\)` can be estimated by `\(301/3215 = .094\)`.
]
.pull-right[
Using Bayes' formula we can compute the various *posterior probabilities*.
.center[![](images/post-prob.png)]
]
---
## Naïve Bayes Classifier
The previous balls in the urn example forms the basis for an important tool in machine learning. We want to determine which of the two classes, `\(I\)` and `\(II\)`, a given urn belongs to. We sample some balls and get some result `\(E\)` of the sampling experiment. The task is to find out which of `\(P(I|E)\)` and `\(P(II|E)\)` is larger.
**Example.** Spam email.Suppose we want to determine whether a certain message is spam or not. We first train our classifier on a large number messages that have been classified by hand, and find the distribution of words in a large collection of spam messages, and likewise the distribution of words in a large number of messages that are not spam. (For example, in a dataset of both spam and non-spam messages, the word ‘won’ was 100 times more likely to occur in a spam message than in a legitimate message.) These word distributions are then treated exactly like the color distributions for the balls. If we are given a fresh document, D, we view it simply as a collection of words, and compute two scores for it, one relative to the spam distribution found during training, and the other relative to the non-spam distribution, and choose the class associated with the higher score.
> What is 'naïve' about this method is that it ignores thinkgs like the occurrees of key phrases or anything else having to do with the order of words in the document and instead treats the generation of a documents as simply pulling a bunch of words out of a bag of words. *In fact, this is called the 'bag of words' model in machine learning literature.*
---
class: inverse center middle
## Continuous Probability Spaces
---
## What does 'probability 0' mean?
**Example.** A spinner.
.left-column[
![](images/spinner.png)
![](images/spinner-2.png)
]
The points on the circumference of the circle are labeled by the half-open interval `\(S=\{x \in \mathbf{R}:0 \le x <1 \}\)`. This set can be denoted by `\([0,1)\)`. You can think of the spinner as a continuous analogue of a die: the outcomes are somehow 'equally likely'. This experiment is simulated by a call to the random number generator `rand()`.
For example: `\(E=\{x \in \mathbf{R}:0.5 \le x \le 0.75 \}=[0.5,0.75]\)` as depicted in the bottom left figure has probability `\(0.25\)`, since it occupies exactly 1/4 of the circumference.
By the same reasoning, we can say the probability of the half-open interval `\([0.5,0.75]\)`, which is obtained by removing the point `\(0.75\)`, is `\(0.25\)`. Thus because the union is disjoint:
`\(0.25 = P([0.5,0.75])\)`
`\(= P([0.5,0.75) \cup \{0.75\})\)`
`\(= P([0.5,0.75)) + P(\{0.75\})\)`
`\(= 0.25 + P(\{0.75\})\)`
This implies `\(P(\{0.75\}) = 0\)`
Likewise, the probability of any individual point is 0. In a continuous space, that does not mean that an event is "impossible"!
---
## So how are they different?
In a continuous probability space, the probability axioms are just the same as they were for discrete spaces:
* Complementary probabilities add to 1
* The probability of a pairwise disjoint union of events is the sum of the probabilities of the individual events
* etc.
What is different is that the probability function is *not* determined by the probabilities of individual outcomes, which typically are all 0.
---
## Continuous Random Variables
**Recap:** The definition of a random variable is the same for continuous probability spaces as for discrete ones: A random variable just associates a number to every outcome in the sample space.
.pull-left[
For a *discrete* random variable, the PMF:
`\(P_X(x)=P(X=x)\)`
Making the CDF `\(F_X\)` defined as:
`\(F_X(x) = P(X \le x)\)`
For *continuous random variables*, the CDF still makes sense, and has the same
definition, but the PMF gives no information—it typically assigns probability 0 to
every real number! What replaces the PMF in the continuous case—the **probability density function (PDF)**.
]
.pull-right[
Referring back to the spinner example, if we plot the CDF of the outcomes:
.center[
![](images/cdf-spinner.png)
]
]
---
## PDF
> The density function is the derivative of the cumulative distribution function. It is defined by:
.center[
`\(f_X(a)=F'_X(a)\)`
]
If you turn things around, the CDF can be recovered by integrating the PDF:
.center[
`\(F_X(a)=\int_{-\infty}^x f_X(t)dt.\)`
]
.pull-left[
The PDF satisfies the following properties:
* `\(f_X(x) \ge 0\)` for all `\(x \in \mathbf{R}\)`
* `\(\int_{-\infty}^\infty f_X(t)dt = 1\)`
These are just like the properties of a PMF, except that the discrete sum is replaced
by an integral. Just as we can define discrete random variables by giving the PMF,
we can define a continuous random variable by giving a function that satisfies the
two properties above.
]
.pull-right[
The PDF of the random variable given the outcome of a spinner is shown below. Observe the CDF is not differential at x=0 and x=1 so the PDF is not defined at these points. Observe the 2 properties of a PDF are satisfied-the graph never drops below the x-axis and the area between the x-axis and the graph is 1.
.center[![](images/pdf-spin.png)]
]
---
## Expected Value of a Continuous Random Variable
The definition of expected value resembles that of the expected value of a discrete random variable, but we replace the PMF by the PDF, and summation by
integration. So we have
.center[
`\(E(X)=\int_{-\infty}^\infty xf_X(x)dx\)`
]
Once again, we have the linearity property (expected value of a sum of random
variables is the sum of the expected values).
**Example.** A single spinner. Since `\(f_X(x)=0\)` outside the interval between 0 and 1, and `\(f_X(x)=1\)` is in the interval, we have
.center[
`\(E(X)=\int_{-\infty}^\infty xf_X(x)dx = \int_{0}^1 xdx = x^2/2|_{0}^1 = 1/2\)`
]
This is exactly what you would expect—if you spin the spinner a bunch of times, the values of the spins should average to `\(1/2\)`!
---
class: inverse center middle
# Variance, Chebyshev's Inequality & Law of Large Numbers
---
## Variance
> The variance and standard deviation of a random variable measure how much
the value of a random variable is likely to deviate from its mean–how ‘spread out’
it is.
**Variance**: If `\(X\)` is a random variable with `\(\mu = E(X)\)`, then `\(Var(X) = E((X-\mu)^2)\)`
**Standard Deviation**: `\(\sigma(X) = \sqrt{Var(X)}\)`
Three important properties:
1. Because of linearity of expectation, we can derive `\(E(X)\)` to get a more simpler equation:
`\(Var(x) = E((X-\mu)^2) = E(X^2-2\mu X +\mu^2) = E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - 2E(X)^2 + E(X)^2\)`
.center[
`\(= E(X^2) - E(X)^2\)`
]
2. If `\(c\)` is a constant then, `\(Var(cX) = c^2Var(x)\)`
3. Suppose `\(X,Y\)` are independent random variables. As we've seen this implies that `\(E(XY)=E(X)E(Y)\)`. By a similar derivation (using linearity of expectation), `\(Var(X+Y) = Var(x) + Var(Y)\)`
---
## Chebyshev's Inequality
> `\(P(|X-\mu| > t \cdot \sigma(Y)) \le \frac{1}{t^2}\)` for any random variable X for which the variance is defined, and any `\(t>0\)`
*Chebyshev’s inequality* tells us, for example, that the probability that a random variable differs by more than 3 standard deviations from its mean is no more than 1/9.
.pull-left[**Example**. Let’s roll a single die and let X be the outcome. Then `\(E(X) = 3.5\)` and `\(Var(X)\)` is:
`\(E(X^2) = \frac{1}{6}(1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2) = \frac{91}{6}\)`
`\(Var(X)=\frac{91}{6} - 3.5^2 \approx 2.92\)`
`\(\sigma(X) \approx 1.71\)`
But, what if we rolled the die *100 times*?
Now,
`\(E(Y) = 350\)`, `\(Var(Y) = 292\)`, and `\(\sigma(Y) \approx 17.1\)`
Now, we have
`\(50 \approx 2.92 \cdot \sigma(Y)\)` so,
]
.pull-right[
`\(P(300 \le Y \le 400) = P(|Y-\mu| \le 50)\)`
`\(= P(|Y-\mu| \le 2.92 \cdot \sigma(Y))\)`
`\(= 1- P(|Y-\mu| > 2.92 \cdot \sigma(Y))\)`
`\(\ge 1-1/2.92^2\)`
`\(\approx 0.88\)`
This tells us that there’s at least an *88% probability* that Y will be between 300 and 400.
]
---
## Law of Large Numbers
Let's continue from the previous example. If we roll the die `\(n\)` times and let `\(X\)` be the sum, then the standard deviation is about `\(1.71 \sqrt{n}\)` and so,
.pull-left[
`\(P(3n \le X \le 4n) = P(|X-\mu| \le n/2)\)`
`\(= P(|X-\mu| \le \frac{\sqrt{n}}{3.42} \cdot \sigma(X))\)`
`\(= 1- P(|X-\mu| > \frac{\sqrt{n}}{3.42} \cdot \sigma(X))\)`
`\(\ge 1- \frac{3.42^2}{n}\)`
This obviously approaches 1 as a limit as the number of tosses gets larger. It’s
also obvious that there is nothing special about `\(3n\)` and `\(4n\)`; any pair of bounds
symmetrically spaced about the mean `\(3.5n\)` would give the same result in the limit.
]
.pull-right[
**Weak law of large numbers:**
`\(\displaystyle\lim_{n \to \infty} P(|Y_n-\sigma|> \epsilon) = 0\)`; where `\(\epsilon=\)`any pos. num.
In terms of complementary probability:
`\(\displaystyle\lim_{n \to \infty} (\mu-\epsilon \le Y_n \le \mu + \epsilon) = 1\)`
It tells us that the value `\(Y_n\)` approaches `\(\mu\)` ‘in probability’: However small a deviation `\(\epsilon\)` from the mean `\(\mu\)` you name, if you perform the experiment often enough, the probability that its average
value differs by as much as `\(\epsilon\)` from the mean is vanishingly small.
]
---
class: inverse center middle
## Normal Distribution & CLT
---
## The Normal Distribution
> The function `\(\phi\)` is called the *standard normal density*. 'Standard' here mearning that is has mean 0 and standard deviation 1.
.center[
`\(\phi(x) = \frac{1}{\sqrt{2\pi}} \cdot e^{-x^2/2}\)`
]
Fun fact, the famous 'bell curve' represents a continuous probability density that is a limiting case of the binomial distribution ass `\(n\)` grows large.
![](images/norm-binom.png) ![](images/norm-binom2.png) ![](images/norm-binom3.png) ![](images/norm-binom4.png)
The corresponding CDF: `\(\Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^x e^{-t^2/2}dt.\)`
Since this is very difficulty to evaluate analytically, it can be approximated using `norm.cdf(x)` from `scipy.stats`
---
## Normal Approximation to Binomial Dist.
> The binomial distribution, adjusted to have mean 0 and standard deviation 1, is closely approximated by the normal distribution, especially as `\(n\)` gets larger. The general principle is that:
.center[
`\(P(a \le \frac{S_{n,p} - np}{\sqrt{np(1-p)}} \le b) \approx \Phi(b) - \Phi(a)\)`
]
**Example.** Using the normal approx. to the binomial distribution, estimate the probability that that a fair coin tossed 100 times comes up heads between 45 and 55 times, inclusive.
- The first consideration is that `\(S_{n,p}\)` is a discrete random variable that only takes integer values. The probability `\(P(45 \le S_{100,.5} \le 55)\)` is equivalent to `\(P(44.5 \le S_{100,.5} \le 55.5)\)`. Using the latter will give us the best results.
.pull-left[
`\(P(44.5 \le S_{100,.5} \le 55.5) = P(\frac{44.5-50}{\sqrt{100 \times 0.25}} \le \frac{S_{100,p}-50}{\sqrt{100 \times 0.25}} \le \frac{55.5-50}{\sqrt{100 \times 0.25}})\)`
`\(= P(-1.1 \le \frac{S_{100,p}-50}{\sqrt{100 \times 0.25}} \le 1.1\)`
`\(\approx \Phi(1.1) - \Phi(-1.1)\)`
`\(= 0.728668\)`
]
.pull-right[
</br>
</br>
Because of the symmetry in the event, this could have also been evaluated as:
`\(1 -2 \cdot \Phi(-1.1)\)`
]
---
## Central Limit Theorem
> Let `\(X\)` be a random variable for which `\(µ = E(X)\)` and `\(σ^2 = Var(X)\)` are defined. Let `\(X1, . . . , X_n\)` be mutually independent random variables, each with the same distribution as `\(X\)`. Think of this as making n independent repetitions of an experiment whose outcome is modeled by the random variable `\(X\)`. Our claim is that the sum of the `\(X_i\)` is approximately normally distributed. Again we adjust the mean and standard deviation to be 0 and 1; then the precise statement is
.center[
`\(\displaystyle\lim_{n \to \infty} P(a< \frac{X_!+\cdots X_n - n\mu}{\sigma\sqrt{n}} <b) = \Phi(b) - \Phi(a)\)`
]
*Note*: The Law of Large Numbers told us that the deviation of the average of `\(n\)` independent identical random variables from its mean approaches 0 as `\(n\)` grows larger. The Central Limit Theorem says more: it tells us how that deviation is distributed.
.pull-left[
**Example.** Roll a die a large `\(N\)` number of times. Let `\(A_N\)` be the average roll. What is the probability that `\(A_N\)` is between 3 and 4? `\(P(3<A_N<4)\)`
Remember for a single roll, `\(\mu =3.5\)`, `\(Var=2.91\)`, `\(\sigma = 1.708\)`. (`scipy.norm.cdf` to compute `\(\Phi\)`)
]
.pull-right[