-
Notifications
You must be signed in to change notification settings - Fork 0
/
examples-hmm.tex
312 lines (284 loc) · 13.3 KB
/
examples-hmm.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
Let us consider a simple data set with only two rows. Each row denotes
whether an individual has a certain sensitive disease. Given such a
data set, we wish to know how many individuals contract the disease in
the data set.
It is easy to see why the query may reveal sensitive information about
individuals. For instance, suppose we know John's record is in the
data set. We immediately infer that John has contracted the disease if
the query answer is $2$. In fact, probabilistic inferences may reveal
too much information in this case. If the query answer is $1$, we also
know that John has $50\%$ of chance to have the disease. If the
population has much lower rate of contracting the disease, John's
privacy is undoubtedly intruded by our probabilistic inference.
\begin{figure}
\centering
\resizebox{.6\columnwidth}{!}{
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
distance=2cm,node/.style={circle,draw}]
\node[node] (i0) at ( -2, 1) { $0$ };
\node[node] (i1) at ( -2, 0) { $1$ };
\node[node] (i2) at ( -2, -1) { $2$ };
\node[node] (o0) at ( 0, 1) { $\underline{0}$ };
\node[node] (o1) at ( 0, 0) { $\underline{1}$ };
\node[node] (o2) at ( 0, -1) { $\underline{2}$ };
\hide{
\node at (1.5, 1) {
$
\begin{array}{l}
0(1),1(0),2(0)
\end{array}
$ };
\node at (1.5, 0) {
$
\begin{array}{l}
0(0),1(1),2(0)
\end{array}
$ };
\node at (1.5, -1) {
$
\begin{array}{l}
0(0),1(0),2(1)
\end{array}
$ };
}
\draw[->,very thick] (i0) -- (o0); % 2/3
\draw[->,ultra thin] (i0) -- (o1); % 1/6
\draw[->,ultra thin] (i0) -- (o2); % 1/6
\draw[->,semithick] (i1) -- (o0); % 1/3
\draw[->,semithick] (i1) -- (o1); % 1/3
\draw[->,semithick] (i1) -- (o2); % 1/3
\draw[->,ultra thin] (i2) -- (o0); % 1/6
\draw[->,ultra thin] (i2) -- (o1); % 1/6
\draw[->,very thick] (i2) -- (o2); % 2/3
\node at (1.5, 1.2) { $\frac{1}{6}$ };
\draw[->,ultra thin] (1, 1) -- (2, 1);
\node at (1.5, 0.2) { $\frac{1}{3}$ };
\draw[->,semithick] (1, 0) -- (2, 0);
\node at (1.5,-0.8) { $\frac{2}{3}$ };
\draw[->,very thick] (1, -1) -- (2, -1);
\hide{
\path
(i0) edge node[above] { $\frac{2}{3}$ } (o0)
(i0) edge node { $\frac{1}{6}$ } (o1)
(i0) edge node[below] { $\frac{1}{6}$ } (o2)
(i1) edge node[above] { $\frac{1}{3}$ } (o0)
(i1) edge node { $\frac{1}{3}$ } (o1)
(i1) edge node[below] { $\frac{1}{3}$ } (o2)
(i2) edge node[above] { $\frac{1}{6}$ } (o0)
(i2) edge node { $\frac{1}{6}$ } (o1)
(i2) edge node[below] { $\frac{2}{3}$ } (o2)
;
}
\end{tikzpicture}
}
\caption{Truncated $\frac{1}{2}$-Geometric Mechanism}
\label{figure:geometric-mechanism}
\end{figure}
\noindent
\textit{Example 4.1}
To see how differential privacy works, let us consider the truncated
$\frac{1}{2}$-geometric mechanism
(Figure~\ref{figure:geometric-mechanism}).
In the figure, the states $0$, $1$, and $2$ denote the number of
individuals contracting the disease; the states $\underline{0}$,
$\underline{1}$, $\underline{2}$ denote the outputs of the
mechanism. At the state $\underline{0}$, \textit{zero} is observed
with probability $1$ but \textit{one} and \textit{two} can never be
observed. Similarly, the states $\underline{1}$ and $\underline{2}$
only observe \textit{one} and \textit{two} respectively.
In the figure, thin arrows denote transitions with probability
$\frac{1}{6}$; medium arrows denote transitions with probability
$\frac{1}{3}$; thick arrows denote transitions with probability
$\frac{2}{3}$. For instance, the state $0$ can transit to the state
$\underline{0}$ with probability $\frac{2}{3}$ while it can transit to
the state $\underline{1}$ with probability $\frac{1}{6}$.
Let us consider a data set whose two members (including John) contract
the disease. Thus the number of individuals contracting the disease is
$2$. From the state $2$, we see the mechanism answers
\textit{zero}, \textit{one}, and \textit{two} with probabilities
$\frac{1}{6}$, $\frac{1}{6}$, and $\frac{2}{3}$ respectively.
Suppose we replace John with an individual who does not contract
the disease. The number of individuals contracting the disease for the
new data set is $1$. From the state $1$, we see the mechanism answers
\textit{zero}, \textit{one}, and \textit{two} with the probability
$\frac{1}{3}$.
If an attacker queries the number of individuals contracting the
disease through the truncated $\frac{1}{2}$-geometric mechanism,
he will get the answer \textit{two} with probability at least
$\frac{1}{3}$, and at most $\frac{2}{3}$ regardless of John's record. The truncated
$\frac{1}{2}$-geometric mechanism is known to be
$\ln(2)$-differentially private. For any two similar data sets, the
mechanism always have similar output distributions. Information
revealed to attackers is therefore limited.
\noindent
\textit{Example 4.2}
It is interesting to see when differential privacy may fail. Consider
a data set consisting of two family members and a contagious
disease. An attacker will immedidately infer the number of individuals
contracting the disease can only be $0$ or $2$. If none has contracted
the disease, the attacker observes \textit{zero}, \textit{one}, and
\textit{two} with probabilities $\frac{2}{3}$, $\frac{1}{6}$, and
$\frac{1}{6}$ respectively. If the two members have the disease, the
probabilities are $\frac{1}{6}$, $\frac{1}{6}$, and $\frac{2}{3}$
respectively. Note that it is $4 = \frac{2/3}{1/6}$ times more likely
to observe \textit{zero} when there is none contracting the disease
than there are two. Similarly, it is $4$ times more likely to observe
\textit{two} in the other case. Even though the
$\frac{1}{2}$-geometric mechanism is $\ln(2)$-differentially private,
it may reveal too much information \emph{when the attacker has prior
knowledge about the data set and disease.}
\noindent
\textit{Example 4.3}
Let us analyze the scenario in the previous example with the
Pufferfish privacy framework. Suppose the attacker knows that the data
set contains two family members and the disease is contagious. We
would like to compare the probabilities of observing \textit{zero}
when a member has contracted the disease or not. If a family member
has contracted the disease, the number of individuals contracting
the disease must be $2$ by the prior knowledge. The attacker thus
observes \textit{zero} with probability $\frac{1}{6}$.
If, on the other hand, a family member has not contracted
the disease, the number of individuals contracting the disease must be
$0$ by the prior knowledge. The attacker observes \textit{zero}
with probability $\frac{2}{3}$. The probabilities of observing
\textit{zero} are $\frac{1}{6}$ and $\frac{2}{3}$ when a family
has contracted the disease or not respectively. The
$\frac{1}{2}$-geometric mechanism does not satisfy
$\ln(2)$-Pufferfish privacy.
\noindent
\textit{Example 4.4}
Prior knowledge does not necessarily lead to privacy leak. Consider a
non-contagious disease. An attacker may know that contracting the
disease is an independent event with probability $p$. Even though the
attacker does not know how many individuals have disease exactly, the
person can still infer that the number of individuals contracting the
disease is $0$, $1$, and $2$ with probabilities $(1-p)^2$, $2p(1-p)$,
and $p^2$ respectively. The prior knowledge can be easily formalized
by an information state in Figure~\ref{figure:geometric-mechanism}.
Consider the information
state where the probabilities at the states $0$, $1$, $2$,
$\underline{0}$, $\underline{1}$, and $\underline{2}$ are $(1-p)^2$,
$2p(1-p)$, $p^2$, $0$, $0$, and $0$ respectively.
Using the Pufferfish framework, we analyze the $\frac{1}{2}$-geometric
mechanism with the prior knowledge as follows. Assume the data set
contains John's record and John has contracted the disease. We would
like to compare probabilities of observations \textit{zero},
\textit{one}, and \textit{two} conditioned on the prior knowledge and
the presence/absence of John's record.
Suppose John's record is not in the data set. With the attacker's
prior knowledge, we see that the mechanism outputs \textit{zero} with
probability $(1-p)^2 \times \frac{2}{3} + 2p(1-p) \times \frac{1}{3} +
p^2 \times \frac{1}{6} = \frac{1}{6} p^2 - \frac{2}{3} p +
\frac{2}{3}$. Similarly, the mechanism outputs \textit{one} and
\textit{two} with probabilities $-\frac{1}{3} p^2 + \frac{1}{3} p +
\frac{1}{6}$ and $\frac{1}{6} p^2 + \frac{1}{3} p + \frac{1}{6}$
respectively.
Now suppose John's record is indeed in the data set. Since John has
the disease, the number of individuals contracting the disease cannot
be $0$. Conditioned on the prior knowledge, we have another
information state whose probabilities at the states $0$, $1$, $2$,
$\underline{0}$, $\underline{1}$, and $\underline{2}$ are $0$,
$\frac{2p(1-p)}{2p(1-p) + p^2} = \frac{2-2p}{2-p}$,
$\frac{p^2}{2p(1-p) + p^2} = \frac{p}{2-p}$, $0$, $0$, and $0$
respectively. From this information state, the mechanism outputs
\textit{zero}, \textit{one}, and \textit{two} with probabilities
$\frac{4-3p}{12-6p}$, $\frac{4-3p}{12-6p}$, and $\frac{2}{6-3p}$
respectively. Table~\ref{table:Pufferfish-geometric-mechanism}
summarizes observation probabilities.
\begin{table}
\caption{Pufferfish Anlysis of $\frac{1}{2}$-Geometric Mechanism}
\label{table:Pufferfish-geometric-mechanism}
\centering
\[
\begin{array}{c|ccc}
& \textit{zero} & \textit{one} & \textit{two} \\
\hline
\textmd{without John's record}
& \frac{p^2 - 4p + 4}{6}
& \frac{-2p^2 + 2p + 1}{6}
& \frac{p^2 + 2p + 1}{6}
\\
\textmd{with John's record}
& \frac{4-3p}{12-6p}
& \frac{4-3p}{12-6p}
& \frac{2}{6-3p}
\end{array}
\]
\end{table}
For observation \textit{zero}, it is not hard to check $\frac{1}{2}
\times \frac{4-3p}{12-6p} \leq \frac{p^2 - 4p + 4}{6} \leq 2 \times
\frac{4-3p}{12-6p}$ for any $0 < p < 1$. Similarly, we have
$\frac{1}{2} \times
\frac{4-3p}{12-6p} \leq \frac{-2p^2 + 2p + 1}{6} \leq 2 \times
\frac{4-3p}{12-6p}$ and $\frac{1}{2} \times \frac{2}{6-3p} \leq
\frac{p^2 + 2p + 1}{6} \leq 2 \times \frac{2}{6-3p}$ for observations
\textit{one} and \textit{two} respectively. Therefore, the
$\frac{1}{2}$-geometric mechanism satisfies $\ln(2)$-Pufferfish
privacy when contracting the disease is \emph{independent}.
In~\cite{KM:14:PFMPD}, it is shown that differential privacy is
subsumed by Pufferfish privacy (Theorem~6.1). This example is an
instance of the general theorem but formalized in hidden Markov
models.
\noindent
\textit{Example 4.5}
It is worth noting that independency of contracting disease is not
necessary for Pufferfish privacy. Let the probabilities of $0$, $1$,
and $2$ individuals contracting the disease are $p_0$, $p_1$, $p_2$
respectively. Then $p_0 + p_1 + p_2 = 1$. Moreover, the probabilities
of observing \textit{zero}, \textit{one}, and \textit{two} are
$\frac{2}{3} p_0 + \frac{1}{3} p_1 + \frac{1}{6} p_2$,
$\frac{1}{6} p_0 + \frac{1}{3} p_1 + \frac{1}{6} p_2$, and
$\frac{1}{6} p_0 + \frac{1}{3} p_1 + \frac{2}{3} p_2$ respectively.
Assume it is known that an individual in the data set has contracted
the disease. The probabilities of $0$, $1$, and $2$ individuals
contracting the disease become $0$, $\frac{p_1}{p_1 + p_2}$, and
$\frac{p_2}{p_1 + p_2}$ respectively. Consequently, the probabilities
of observing \textit{zero}, \textit{one}, and \emph{two} are
$\frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}$,
$\frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}$, and
$\frac{p_1 + 2p_2}{3(p_1 + p_2)}$ respectively. Choose $p_0 =
\frac{1}{2}$, $p_1 = p_2 = \frac{1}{4}$. For the observation
\textit{zero} we have,
$\frac{1}{8} =
\frac{1}{2} \times \frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}
\leq
\frac{11}{24} =
\frac{2}{3} p_0 + \frac{1}{3} p_1 + \frac{1}{6} p_2
\leq
{2} \times \frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}
= \frac{1}{2}$.
Similarly, we have
$
\frac{1}{8} =
\frac{1}{2} \times \frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}
\leq
\frac{5}{24} =
\frac{1}{6} p_0 + \frac{1}{3} p_1 + \frac{1}{6} p_2
\leq
2 \times \frac{\frac{1}{3}p_1 + \frac{1}{6}p_2}{p_1 + p_2}
= \frac{1}{2}$ for the observation \textit{one}. And
$
\frac{1}{4}
=
\frac{1}{2} \times \frac{p_1 + 2p_2}{3(p_1 + p_2)}
\leq
\frac{1}{3}
=
\frac{1}{6} p_0 + \frac{1}{3} p_1 + \frac{2}{3} p_2
\leq
2 \times \frac{p_1 + 2p_2}{3(p_1 + p_2)}
= 1$ for the observation \textit{two}. In other words, the
$\frac{1}{2}$-geometric mechanism satisfies $\ln(2)$-Pufferfish privacy for
attackers with the prior distribution $(\frac{1}{2}, \frac{1}{4},
\frac{1}{4})$ on the number of individuals contracting the disease.
To see the dependency between prior distribution, suppose $(\frac{1}{2}, \frac{1}{4},
\frac{1}{4})$ is binomial. There is $0 \leq p \leq 1$ such that
$(1-p)^2 = \frac{1}{2}$, $2p(1-p) = \frac{1}{4}$, and $p^2 =
\frac{1}{4}$. From $p^2 = \frac{1}{4}$ and $0 \leq p$, we have $p =
\frac{1}{2}$. But we would have $(1-p)^2 = \frac{1}{4} \neq
\frac{1}{2}$. Hence $(\frac{1}{2}, \frac{1}{4}, \frac{1}{4})$ is not
binomial. Even though the prior distribution is not derived by
independent contraction of the disease, $\frac{1}{2}$-geometric
mechanism does not reveal too much information. Independency of
contraction is not necessary for Pufferfish privacy on the geometric
mechanism.