-
Notifications
You must be signed in to change notification settings - Fork 0
/
specifications.tex.bak
180 lines (158 loc) · 10.1 KB
/
specifications.tex.bak
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
In this section we describe how properties in the differential privacy literature can be expressed using $\dpCTLstar$ formulae.
\paragraph{Differential Privacy.}
Consider the survey mechanism (Section~\ref{subsec:survey}). For $v$
with $u N v$, we have $\myPr{u}{K}{N}{\X \mathit{out}_1}\le
3\myPr{v}{K}{N}{\X \mathit{out}_1}$ for the probabilities of satisfying
$\X \mathit{out}_1$ from $u$ and $v$.
The formula $\dpriv{\ln 3}{0} (\X \mathit{out}_1)$ holds in state $u$
and similarly for $\dpriv{\ln 3}{0} (\X \mathit{out}_0)$.
Recall that differential privacy requires similar output distributions
on neighbors. The formula $\dpriv{\ln 3}{0} (\X
\mathit{out}_1)\wedge \dpriv{\ln 3}{0} (\X \mathit{out}_0)$ thus
specifies differential privacy for states $+$ and $-$.
The survey mechanism is $(\ln 3, 0)$-differentially private.
\hide{Let us recall briefly the analysis of the survey mechanism. For any $x \in \mathcal{X}$, we have $1/4 \leq \Pr[M(x) = Y] \leq
3/4$. Hence for any neighboring $x, x' \in \mathcal{X}$, $\Pr[M(x) =
Y] \leq 3/4 = 3 \cdot 1/4 \leq 3 \Pr[M (x') =
Y]$. Similarly, $\Pr[M (x) = N] \leq 3 \Pr[M (x') = N]$ for any
neighboring $x, x' \in \mathcal{X}$.
}
For the $\frac{1}{2}$-geometric mechanism (Section~\ref{subsec:geometric}),
define the formula
$\psi = \dpriv{\ln 2}{0} (\X out_0) \wedge \dpriv{\ln 2}{0} (\X out_1) \wedge
\cdots \wedge \dpriv{\ln 2}{0} (\X out_5)$. If the state $s_k$ satisfies
$\psi$ for $k = 0, \ldots, 5$, then the $\frac{1}{2}$-geometric mechanism is
$(\ln 2, 0)$-differentially private. For the subsampling majority
mechanism (Section~\ref{subsection:subsampling}), consider the formula
$\psi = \dpriv{\epsilon, \delta} {(\F 0)} \wedge \dpriv{\epsilon,
\delta} {(\F 1)}$. If a state satisfies $\psi$, its probability of
outputting is $(\epsilon, \delta)$-close to those of its
neighbor for every outcomes. The subsampling majority
mechanism is $(\epsilon, \delta)$-differentially private.
\paragraph{Compositionality.}
Compositionality is one of the building blocks for differential privacy. For any $(\epsilon_1,\delta_1)$-differentially private mechanism $M_1$ and
$(\epsilon_2,\delta_2)$-differentially private mechanism $M_2$, their combination $(M_1(x), M_2(x))$ is $(\epsilon_1 + \epsilon_2,\delta_1+\delta_2)$-differentially private by the compositional theorem \cite[Theorem 3.16]{DR:14:AFDP}. The degradation is rooted in the repeated releases of information. To illustrate this property, we consider the extended survey mechanism which allows two consecutive queries.
In this mechanism, an input is either $+$ or $-$; but
outputs are $\mathit{out}_1\mathit{out}_1$,
$\mathit{out}_1\mathit{out}_0$, $\mathit{out}_0\mathit{out}_1$, or
$\mathit{out}_0\mathit{out}_0$.
The model is depicted in Figure~\ref{figure:2-dp-mc}.
%where states $Y_1,Y_2$ are labelled by $Y$, and $N_1,N_2$ are labelled by $N$.
\begin{figure}
\centering
\resizebox{.65\columnwidth}{!}{
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
distance=1cm,node/.style={circle,draw}]
\node[node] (p) at ( -3, 0) { $+$ };
\node[node] (pY1) at (-1.5, .8) { $s_1$ };
\node at (-1.5, 1.3) { $\mathit{out}_1$ };
\node[node] (pN1) at (-1.5, -.8) { $t_1$ };
\node at (-1.5, -.3) { $\mathit{out}_0$ };
\node[node] (Y2) at ( 0, .8) { $s_2$ };
\node at (0, 1.3) { $\mathit{out}_1$ };
\node[node] (N2) at ( 0, -.8) { $t_2$ };
\node at (0, -.3) { $\mathit{out}_0$ };
\node[node] (n) at ( 3, 0) { $-$ };
\node[node] (nY1) at ( 1.5, .8) { $\bar{s}_1$ };
\node at (1.5, 1.3) { $\mathit{out}_1$ };
\node[node] (nN1) at ( 1.5, -.8) { $\bar{t}_1$ };
\node at (1.5, -.3) { $\mathit{out}_0$ };
\path
(p) edge node [above] { $\frac{3}{4}$ } (pY1)
(p) edge node [below] { $\frac{1}{4}$ } (pN1)
(pY1) edge node [above] { $\frac{3}{4}$ } (Y2)
(pY1) edge node [above] { $\frac{1}{4}$ } (N2)
(pN1) edge node [below] { $\frac{3}{4}$ } (Y2)
(pN1) edge node [below] { $\frac{1}{4}$ } (N2)
(n) edge node [above] { $\frac{1}{4}$ } (nY1)
(n) edge node [below] { $\frac{3}{4}$ } (nN1)
(nY1) edge node [above] { $\frac{1}{4}$ } (Y2)
(nY1) edge node [above] { $\frac{3}{4}$ } (N2)
(nN1) edge node [below] { $\frac{1}{4}$ } (Y2)
(nN1) edge node [below] { $\frac{3}{4}$ } (N2)
;
\end{tikzpicture}
}
% \caption{Markov Chain of}
% \label{figure:2-dp-mc2}
\caption{Markov Chain of Double Surveys}
\label{figure:2-dp-mc}
\end{figure}
Consider the formula $\dpriv{\ln 9}{0} (\X (\mathit{out}_1\wedge \X \mathit{out}_1))$. A path satisfies $\X (\mathit{out}_1\wedge \X \mathit{out}_1)$ if the second state satisfies $\mathit{out}_1$ and the third state satisfies $\mathit{out}_1$ as well. We verify that this formula is satisfied for states $+$ and $-$. Moreover, the bound $\epsilon=\ln 9$ is tight since the probability of satisfying $\X (\mathit{out}_1\wedge \X \mathit{out}_1)$ from states $+$ and $-$ are $\frac{9}{16}$ and $\frac{1}{16}$ respectively. Finally, the formula $\wedge_{a_1,a_2}\dpriv{\ln 9}{0} (\X (a_1\wedge \X a_2))$ specifies differential privacy for the model, where $a_1,a_2$ range over atomic propositions $\{\mathit{out}_1, \mathit{out}_0\}$.
Let us consider two slightly different formulae for comparison:
\begin{itemize}
\item $\dpriv{\ln 3}{0} (\X \X \mathit{out}_1)$. In this case we claim
there is no privacy loss, even though there are two queries. The reason is that the output of the first query is not observed at all. It is easy to verify that it is indeed satisfied by $+$ and $-$.
\item $\dpriv{\ln 3}{0} (\X (\mathit{out}_1\wedge \dpriv{\ln 3}{0}(\X \mathit{out}_1)))$. This is a nested $\dpCTL$ formula, where the inner state formula $\dpriv{\ln 3}{0}(\X \mathit{out}_1)$ specifies the one-step differential privacy. Observe the inner formula is satisfied by all states. The outer formula has no privacy loss.
\end{itemize}
\paragraph{Tighter Privacy Bounds for Composition.}
An advantage of applying model checking is that we may get tighter
bounds for composition. Consider the survey mechanism, and the
property $\dpriv{0}{.5} (\X \mathit{out}_1)$. Obviously, it holds in
states $+$ and $-$ since $\myPr{u}{K}{N}{\mathit{out}_1} =
\frac{3}{4}$, $\frac{1}{4}$ for $u = +, -$ respectively (Figure~\ref{figure:2-dp}). A careful check infers that one cannot decrease $\delta_1=.5$ without increasing $\epsilon$.
Now consider the formula $\dpriv{\epsilon_2}{\delta_2} (\X
(\mathit{out}_1\wedge \X \mathit{out}_1))$ in
Figure~\ref{figure:2-dp-mc}. Applying the
compositional theorem, one has $\epsilon_2=2\epsilon_1=0$ and
$\delta_2=2\delta_1=1$. However, we can check easily that one gets
better privacy parameter $(0,.5)$ using the model checking algorithm
because $\myPr{u}{K}{N}{\mathit{out}_1} = \frac{9}{16}$,
$\frac{1}{16}$ for $u = +, -$ respectively.
In general, compositional theorems for differential privacy only give
asymptotic upper bounds. Privacy parameters $\epsilon$ and $\delta$
must be calculated carefully and often pessimistically. Our
algorithm allows data analysts to choose better parameters.
\hide{
\paragraph{Reachability Privacy.}
The formula $\dpriv{\epsilon}{\delta}(F \top)$ is a reachability privacy. It specifies that neighbors should have similar probabilities of eventually reaching $\top$. Reachability privacy is often used in combination with other properties.
In the above threshold mechanism (Section~\ref{subsec:threshold}),
recall that outputs are of the form $\bot \cdots \bot \top$. In order
to establish differential privacy, one wishes to show that $\Phi_i$
holds for every $i \in \bbfZ^{\geq 0}$ where $\Phi_i =
\dpriv{3}{0}(\X^i \top)$. That is, the mechanism halts at the $i$-th
step with similar probabilities for every $i$.
It is not clear how to verify them automatically.
Now consider $\Phi=\wedge_{i=1}^{6} \dpriv{3}{0}(\X^i \top)
\wedge \PJ{\le .025}(\X^6 F \top) \wedge \dpriv{3}{.025}(\X^6 F
\top)$. The state with the threshold $t=3$, and the query result $r=4$
satisfies $\Phi$ by our computation. It means the
probability of reaching $\top$ at $i$ steps with $i$ ranges over
$\{1,\ldots,6\}$, is $(3, 0)$-differentially private for
neighbors ($r=3$ and $r=5$). It also states that the
probability of reaching $\top$ after no less than $6$ steps, is
smaller than $.025$ and $(3, .025)$-differentially private as well.
Rather than checking the probabilities of all possible outputs, $\Phi$
focuses on finitely many short outputs. It says the mechanism enjoys $(3,
0)$-differentially privacy for the short outputs, and long outputs are
unlikely. Moreover, all neighbors are unlikely to have long outputs.
Using this trick, one can bound the probability of long outputs
arbitrarily (say, less than $10^{-6}$). Our technique can establish
practical privacy guarantees automatically.
}
%Later we use reachability privacy to specify differential privacy property for MDPs.
\hide{
\paragraph{Infinite Differential Privacy.}
Thanks to the expressiveness of linear temporal logic, our logic $\dpCTLstar$ generalizes differential privacy to infinite
behaviors naturally. Classical differential privacy discusses properties about
randomized terminating mechanisms.
% Sophisticated mechanisms for streaming data have also been discussed
% in social networks and internet of things. Such mechanisms inherently
% have infinite computation. Their privacy guarantees are approximated
% by those of the prefixes of computation.
With $\dpCTLstar$, the subformula $\phi$ in $\dpriv{\epsilon}{\delta} \phi$
can be an arbitrary $\dpCTLstar$ path formula. One can specify
$(\epsilon, \delta)$-closedness over infinite behaviors naturally.
Consider the Bluetooth example in~\cite{DKNP04} and the
formula $\dpriv{\epsilon}{\delta}{(\F \G \mathit{timeout})}$. Let us say
two states are neighbors if a slave device in a state is replaced by
another device in the other state.
Then the formula holds if the probability of having
continuous time out (say, in the denial of service attack) is
$(\epsilon, \delta)$-close for all neighbors. That is, replacing a slave
device cannot increase the chance of service failure significantly.
In many applications, data analyses are carried out constantly.
Privacy analysis over infinite behaviors is crucial.
We will revisit this issue again in
Section~\ref{section:applications}.
}