introduction.tex


% privacy and differential privacy

In the era of data analysis, personal information is constantly
collected and analyzed by various parties. Privacy has become an
important issue for every individual. In order to address such
concerns, the research community has proposed several privacy
preserving mechanisms over the years (see~\cite{FWC:10:PPDP} for a
slightly outdated survey). Among these mechanisms, differential
privacy has attracted much attention from theoretical computer science
to industry~\cite{DR:14:AFDP,JLE:14:DPML,A:16:EPYU}.

Differential privacy formalizes the tradeoff between privacy and
utility in data analysis. Intuitively, a randomized data analysis
mechanism is differentially private if it behaves similarly on similar
input datasets~\cite{DMNS:06:CNSPD,D:06:DP}. Consider, for example,
the Laplace mechanism where analysis results are perturbed by random
noises with the Laplace distribution~\cite{DR:14:AFDP}. Random noises
hide the differences of analysis results from similar datasets.
Clearly, more noises
give more privacy but less utility in released perturbed
results. Under the framework of differential privacy, data analysts
can balance the tradeoff rigorously in their data analysis
mechanisms~\cite{DR:14:AFDP,JLE:14:DPML}.

% model checking differentially private implementation

Designing differentially private mechanisms can be tedious for
sophisticated data analyses. Privacy leak has also been observed in
data analysis programs implementing differential
privacy~\cite{M:12:SLSBDP,TKBW:17:PLAI}. This calls for formal
analysis of differential privacy on both designs and implementations.
In this paper, we propose the logic $\dpCTLstar$ for specifying
differential privacy and investigate their model checking
problems. Data analysts can automatically verify their designs and
implementations with our techniques. Most interestingly, our
techniques can be adopted easily by existing probabilistic
model checkers. Privacy checking with existing tools is attainable
with minimal efforts. More interaction
between model checking~\cite{CGPMC,BK:08:PMC} and privacy analysis hopefully will follow.

% Markov chains and MDP as formal models, actions made by adversary

In order to illustrate applicability of our techniques, we give
detailed formalizations of several data analysis mechanisms in this
paper.
In differential privacy, data analysis mechanisms are
but randomized algorithms. We follow the standard practice in
probabilistic model checking to formalize such mechanisms
as Markov chains or Markov decision processes~\cite{Put05}.
%We give formalizations of several differentially private mechanisms
%and explain subtle privacy conditions.
When a data analysis mechanism does not
interact with its environment, it is formalized as a Markov
chain. Otherwise, its interactions are formalized by non-deterministic
actions in
Markov decision processes. Our formalization effectively assumes that
actions are controlled by adversaries. It thus
considers all privacy attacks from adversaries in order to establish
differential privacy as required.
%\todo{What is a state?}

% new differential privacy state modal operator

Two ingredients are introduced to specify differentially private
behaviors. A reflexive and symmetric user-defined binary relation
over states is required to formalize similar datasets. We moreover add
the path quantifier $\dpriv{\epsilon}{\delta}$ for specifying similar
behaviors. Informally, a state satisfies $\dpriv{\epsilon}{\delta}
\phi$ if its probability of having path satisfying $\phi$ is close to
those of similar states. Consider, for instance, a data analysis
mechanism computing the likelihood ($\mathit{high}$ or $\mathit{low}$)
of an epidemic. A state satisfying
$\dpriv{\epsilon}{\delta} (\F \mathit{high}) \wedge
\dpriv{\epsilon}{\delta} (\F \mathit{low})$ denotes similar states
have similar probabilities on every outcomes.
%\todo{not clear here}\lz{here it should be $\X$?}

\hide{
% differential privacy of infinite behaviors

An important byproduct of this work is to analyze privacy over
infinite behaviors. Consider the formula $\dpriv{\epsilon}{\delta}
(\G \F \mathit{high})$. A state satisfies the formula if it
is as likely to answer $\mathit{high}$ infinitely often
as similar states. Compared to the unbounded
but finite behavior specified by $\F \mathit{high}$,
$\G \F \mathit{high}$ specifies an infinite behavior.
Not only can differential privacy specify randomized algorithms, it
also generalizes to reactive systems naturally.
More data analysis mechanisms are admissible for sophisticated formal
privacy analysis through our techniques.
}

% construction and complexity

We moreover extend the standard probabilistic model checking
algorithms to verify $\dpCTLstar$ properties automatically. For Markov
chains, states
satisfying a subformula $\dpriv{\epsilon}{\delta} \phi$ are computed
by a simple variant of the model checking algorithm for Markov chains.
The time complexity of our algorithm is the same as
those of $\PCTLstar$ for Markov chains. The logic $\dpCTLstar$ obtains its
expressiveness essentially for free.
For Markov decision processes, checking whether a state
satisfies $\dpriv{\epsilon}{\delta} \phi$ is undecidable.

% analyze mechanism

\noindent
\emph{Related Work.}
An early attempt on formal verification of differential privacy
is~\cite{TKD:11:FVDPIS}. The work formalizes differential privacy in
the framework of information leakage. The connection between
differential privacy and information leakage is investigated
in~\cite{AACDP:15:OILDPM,GMP:16:PDPFPS}. Type systems for differential
privacy have been developed
in~\cite{WHRP:17:FADP,GHHNP:13:LDPDP,RP:10:DMTGS}. A light-weight
technique for checking differential privacy can be found
in~\cite{ZK:17:LTADPP}. Lots of formal Coq proofs about differential
privacy are reported
in~\cite{BFGGHS:16:APCDP,BGGHS:16:PDPPC,BKOB:12:PRRDP,BDGKZ:13:VCDPASM,BGAHKS:14:PDPHL,BGAHRS:15:HOARRT,BFGAGHS:16:DPBP}. This
work emphasizes on model checking differential privacy. We develop a
framework to formalize and analyze differential privacy in Markov
chains and Markov decision processes.

\noindent
\emph{Contributions.} Our main contributions are threefold.
\begin{enumerate}
\item We introduce the logic $\dpCTLstar$ for reasoning about
differential privacy. The logic is able to express subtle and generalized
differentially private properties;
%Complex properties such as those involving infinite behaviors can
%also be expressed;
\item We model several differentially private mechanisms in Markov chains or Markov decision processes; and
\item We show that the model checking
problem for Markov chains is standard. For Markov decision processes, we show that it is undecidable.
\end{enumerate}