forked from algorhythms/Algo-Quicksheet
-
Notifications
You must be signed in to change notification settings - Fork 0
/
chapterProbability.tex
164 lines (134 loc) · 5.12 KB
/
chapterProbability.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
\chapter{Probability}
\section{Shuffle}
Equal probability shuffle algorithm.
\subsection{Incorrect naive solution}
Swap current card $A_i$ with a random card from the entire deck $A$.
\begin{java}
for (int i = 0; i < N; i++) {
int j = (int) Math.random()*N;
swap(a[i], a[j]);
}
\end{java}
\begin{python}
def shuffle(A):
n = len(A)
for i in xrange(n):
j = random.randrange(n)
A[i], A[j] = A[j], A[i]
\end{python}
Consider 3 cards, the easiest proof that this algorithm does not produce a uniformly random permutation is that it generates $3^3=27$ possible plans (consider steps in plans, duplicated result included), but there are only 3! = 6 permutations. Since $27\%3 \neq 0$, there must be some permutation is that is picked too much, and some that is picked too little.
\subsection{Knuth Shuffle}
Knuth (aka Fisher-Yates) shuffling algorithm guarantees to rearrange the elements in uniformly random order.
\\
Core clues:
\begin{enumerate}
\item choose index uniformly $\in [i, N)$.
\item just like shuffling the poker card.
\end{enumerate}
\begin{java}
public void shuffle(Object[] a) {
int N = a.length;
for (int i = 0; i < N; i++) {
// choose index uniformly in [i, N)
int j = i + (int) (Math.random() * (N - i));
swap(a[i], a[j]);
}
}
\end{java}
\begin{python}
def shuffle(A):
n = len(A)
for i in xrange(n):
j = random.randrange(i, n)
A[i], A[j] = A[j], A[i]
\end{python}
\runinhead{Proof:}
$n!$ permutations.
\subsection{Random Maximum}
Find the index of maximum number in an array with a probability of $\frac{1}{\#maxima}$.
\runinhead{Code:}
\begin{python}
def random_max(A):
maxa, maxa_idx, k = A[0], 0, 1
for i in xrange(1, len(A)):
if maxa < A[i]:
maxa, maxa_idx, k = A[i], i, 1
elif maxa == A[i]:
k += 1
if random.random() < float(1)/k:
maxa_idx = i
return maxa_idx
\end{python}
\runinhead{Proof:} Prove via mathematical induction.
That is, assuming it works for any array of size $N$, prove it works for any array of size $N+1$.
So, given an array of size $N+1$, think of it as a sub-array of size $N$ followed a new element at the end. By assumption, your algorithm uniformly selects one of the max elements of the sub-array... And then it behaves as follows:
If the new element is larger than the max of the sub-array, return that element. This is obviously correct.
If the new element is less than the max of the sub-array, return the result of the algorithm on the sub-array. Also obviously correct.
The only slightly tricky part is when the new element equals the max element of the sub-array. In this case, let the number of max elements in the sub-array be $k$. Then, by hypothesis, your algorithm selected one of them with probability $\frac{1}{k}$. By keeping that same element with probability $\frac{k}{k+1}$, you make the overall probability of selecting that same element equal $$\frac{1}{k} \cdot \frac{k}{k+1} = \frac{1}{k+1}$$, as desired. You also select the last element with the same probability.
To complete the inductive proof, just verify the algorithm works on an array of size 1
\section{Sampling}
\subsection{Reservoir Sampling}
Sample $k$ from $A$, where the length of $A$ is either very large or unknown or dynamic.
\begin{python}
def reservoir_sample(A, sz):
R = A[:sz] # sz for size
for i in xrange(sz, len(A)):
rv = random.randrange(0, i+1)
if rv < sz: # rv for random variable
R[rv] = A[i]
\end{python}
Until index $i$, with elements of length $i+1$ scanned, every element has a probability of $\frac{sz}{i+1}$ in the reservoir.
\runinhead{Random pick index.} Given an array of integers with possible duplicates, randomly output the index of a given target number.
\begin{python}
def pick(self, target):
sz = 0
ret = None
for idx, val in enumerate(A):
if val == target:
sz += 1
rv = random.randrange(0, sz)
if rv == 0:
ret = idx
return ret
\end{python}
\section{Distribution}
\subsection{Geometric Distr}
$$
P(X=k) = (1-p)^{k-1}p
$$
Expected number of trials of get a specific outcome:
$$
E[T] = \frac{1}{p}
$$
, which is the mean of the Geometric Distr.
\subsection{Binomial Distr}
Notations:
$$
B(n, p)
$$
pmf:
$$
{n \choose k}\,p^{k}(1-p)^{n-k}
$$
\section{Expected Value}
\subsection{Dice value}
Expected value of rolling dice until getting a 3
\subsection{Coupon collector's problem}
Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once?
$$
E[T] = \Theta(n \lg n)
$$
, where $T$ is number of trial (i.e. time).
Let $T$ be the time to collect all. $t_i$ be the time to collect the $i$-th new different coupon. $p_i$ be the probability of collecting the $i$-th coupon after $i-1$ coupons have been collected. Observe that:
\begin{align*}
p_1 &= \frac{n}{n} \\
p_2 &= \frac{n-1}{n} \\
p_i &= \frac{n-i+1}{n}
\end{align*}
Thus,
\begin{align*}
E[T] &= \sum_{i=1}^n E[t_i] \\
&= \sum \frac{1}{p_i} \\
&= n(\frac{1}{1}+\frac{1}{2}+...+\frac{1}{n})
\end{align*}
\runinhead{Dice.} How many times must you roll a die until each side has appeared?