Skip to content

Commit

Permalink
Update post
Browse files Browse the repository at this point in the history
  • Loading branch information
hoonose committed Jul 16, 2023
1 parent aa8399a commit 8606b11
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions _posts/2023-07-18-colt23-bsp.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: post
title:
title: Covariance-Aware Private Mean Estimation, Efficiently
comments: true
authors:
- gautamkamath
Expand All @@ -9,8 +9,8 @@ timestamp: 09:00:00 -0400
---

Last week, the Mark Fulk award for best student paper at [COLT 2023](https://learningtheory.org/colt2023/) was awarded to the following two papers on private mean estimation:
- [A Fast Algorithm for Adaptive Private Mean Estimation](https://arxiv.org/abs/2301.07078), by [John Duchi](https://web.stanford.edu/~jduchi/), [Saminul Haque](https://dblp.org/pid/252/5821.html), and [Rohith Kuditipudi](https://web.stanford.edu/~rohithk/) [DHK23];
- [Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions](https://arxiv.org/abs/2301.12250) by [Gavin Brown](https://cs-people.bu.edu/grbrown/), [Samuel B. Hopkins](https://www.samuelbhopkins.com/), and [Adam Smith](https://cs-people.bu.edu/ads22/) [BHS23].
- [A Fast Algorithm for Adaptive Private Mean Estimation](https://arxiv.org/abs/2301.07078), by [John Duchi](https://web.stanford.edu/~jduchi/), [Saminul Haque](https://dblp.org/pid/252/5821.html), and [Rohith Kuditipudi](https://web.stanford.edu/~rohithk/) **[[DHK23](https://arxiv.org/abs/2301.07078)]**;
- [Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions](https://arxiv.org/abs/2301.12250) by [Gavin Brown](https://cs-people.bu.edu/grbrown/), [Samuel B. Hopkins](https://www.samuelbhopkins.com/), and [Adam Smith](https://cs-people.bu.edu/ads22/) **[[BHS23](https://arxiv.org/abs/2301.12250)]**.

The main result of both papers is the same: the first computationally-efficient $$O(d)$$-sample algorithm for differentially-private Gaussian mean estimation in Mahalanobis distance.
In this post, we're going to unpack the result and explain what this means.
Expand All @@ -37,33 +37,33 @@ Note that these guarantees hold regardless of the true covariance matrix $$\Sigm
It isn't quite so easy when we want to do things privately.
The most natural way would be add noise to the empirical mean.
However, we first have to "clip" the datapoints (i.e., rescale any points that are "too large") in order to limit the sensitivity of this statistic.
This is where the challenges arise: we would ideally like to clip the data based on the shape of the (unknown) covariance matrix $$\Sigma$$ [KLSU19].
This is where the challenges arise: we would ideally like to clip the data based on the shape of the (unknown) covariance matrix $$\Sigma$$ **[[KLSU19](https://arxiv.org/abs/1805.00216)]**.
Deviating significantly from $$\Sigma$$ would either introduce bias due to clipping too many points, or add excessive amounts of noise.
Unfortunately, the covariance matrix $$\Sigma$$ is unknown, and privately estimating it (in an appropriate requires $$\Omega(d^{3/2})$$ samples [KMS22].
Unfortunately, the covariance matrix $$\Sigma$$ is unknown, and privately estimating it (in an appropriate requires $$\Omega(d^{3/2})$$ samples **[[KMS22](https://arxiv.org/abs/2205.08532)]**.
This is substantially larger than the $$O(d)$$ sample complexity of non-private Gaussian mean estimation.
Furthermore, this covariance estimation step really is the bottleneck.
Given a coarse estimate of $$\Sigma$$, only $$O(d)$$ additional samples are required to estimate the mean privately in Mahalanobis distance.
This leads to the intriguing question: is it possible to privately estimate the mean of a Gaussian *without* explicitly estimating the covariance matrix?

The answer is yes!
Brown, Gaboardi, Smith, Ullman, and Zakynthinou [BGSUZ21] give two different algorithms for private Gaussian mean estimation in Mahalanobis distance, which both require only $$O(d)$$ samples.
Brown, Gaboardi, Smith, Ullman, and Zakynthinou **[[BGSUZ21](https://arxiv.org/abs/2106.13329)]** give two different algorithms for private Gaussian mean estimation in Mahalanobis distance, which both require only $$O(d)$$ samples.
Interestingly, the two algorithms are quite different from each other.
One simply adds noise to the empirical mean based on the empirical covariance matrix.
The other one turns to a technique from robust statistics, sampling a point with large *Tukey depth* using the exponential mechanism.
As described here, neither of these methods is differentially private yet -- they additionally require a pre-processing step which checks if the dataset is sufficiently well-behaved, which happens with high probability when the data is generated according to a Gaussian distribution.
The major drawback of both algorithms: they require exponential time to compute.

The two awarded papers [DHK23] and [BHS23] resolve this issue, giving the first *computationally efficient* $$O(d)$$ sample algorithms for private mean estimation in Mahalanobis distance.
The two awarded papers **[[DHK23](https://arxiv.org/abs/2301.07078)]** and **[[BHS23](https://arxiv.org/abs/2301.12250)]** resolve this issue, giving the first *computationally efficient* $$O(d)$$ sample algorithms for private mean estimation in Mahalanobis distance.
Interestingly, the algorithms in both papers follow the same recipe as the first algorithm mentioned above: add noise to the empirical mean based on the empirical covariance matrix.
The catch is that the empirical mean and covariance are replaced with *stable* estimates of the empirical mean and covariance, where stability bounds how much the estimators can change due to modification of individual datapoints.
Importantly, these stable estimators are efficient to compute.
Further details of these subroutines are beyond the scope of this post, but the final algorithm simply adds noise to the stably-estimated mean based on the stably-estimated covariance.
Different extensions of these results are explored in the two papers, including estimation of covariance, and mean estimation in settings where the distribution may be heavy-tailed or rank-deficient.

Most of the algorithms described above are based on some notion of *robustness*, thus suggesting connections to the mature literature on robust statistics.
These connections have been explored as far back as 2009, in seminal work by Dwork and Lei [DL09].
Over the last couple years, there has been a flurry of renewed interest in links between robustness and privacy, including, e.g., [BKSW19, KSU20, KMV22, LKO22, HKM22, GH22, HKMN23, AKTVZ23, AUZ23], beyond those mentioned above.
For example, some works [GK22, HKMN23, AUZ23] show that, under certain conditions, a robust estimator implies a private one, and vice versa.
These connections have been explored as far back as 2009, in seminal work by Dwork and Lei **[[DL09](https://dl.acm.org/doi/10.1145/1536414.1536466)]**.
Over the last couple years, there has been a flurry of renewed interest in links between robustness and privacy, including, e.g., **[[BKSW19](https://arxiv.org/abs/1905.13229), [KSU20](https://arxiv.org/abs/2002.09464), [KMV22](https://arxiv.org/abs/2112.03548), [LKO22](https://arxiv.org/abs/2111.06578), [HKM22](https://arxiv.org/abs/2111.12981), [GH22](https://arxiv.org/abs/2211.00724), [HKMN23](https://arxiv.org/abs/2212.05015), [AKTVZ23](https://arxiv.org/abs/2212.08018), [AUZ23](https://arxiv.org/abs/2302.01855)]**, beyond those mentioned above.
For example, some works **[[GH22](https://arxiv.org/abs/2211.00724), [HKMN23](https://arxiv.org/abs/2212.05015), [AUZ23](https://arxiv.org/abs/2302.01855)]** show that, under certain conditions, a robust estimator implies a private one, and vice versa.
The two awarded papers expand this literature in a somewhat different direction -- the type of stability property considered leads to algorithms which qualitatively differ from those considered prior.
It will be interesting to see how private and robust estimation evolve together over the next several years.

Expand Down

0 comments on commit 8606b11

Please sign in to comment.