Skip to content

Release of optimal mean estimator

Latest
Compare
Choose a tag to compare
@bschulz81 bschulz81 released this 24 Jun 08:29
· 21 commits to main since this release
5a5b834

#Description:

This module is a python implementation of the optimal subgaussian mean estimator
from

J. C. H. Lee and P. Valiant, "Optimal Sub-Gaussian Mean Estimation in R"
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS),
Denver, CO, USA, 2022, pp. 672-683, doi: 10.1109/FOCS52979.2021.00071.
https://arxiv.org/abs/2011.08384

As the provided examples show, the estimator is superior to the ordinary population mean
for symmetric heavily tailed distributions, like the T or the Laplace distributions.
In 1000 trials, the optimal estimator is better roughly 662 trial runs
and the population mean is better in just 338 trials.

For standard normal distributions, a few executions of the last example show that the estimator
is somewhat equal to the population mean (within the supplied confidence interval delta and
apart from numerical floating point imprecisions). In 1000 trial runs, the population mean is
sometimes slightly better in 516 runs and in 484 cases the optimal estimator is better.

The implementation consists of a function

mean

that computes the optimal mean estimator for numpy arrays.

it expects a numpy array a, a confidence parameter delta and its other
arguments match the behavior of the numpy.mean function whose documentation is
given here:
https://numpy.org/doc/stable/reference/generated/numpy.mean.html

The computed mean estimator is by default a numpy f64 value if the array is of
integer type. Otherwise, the result is of the same type as the array, which is
also similar as the numpy mean function.

The estimator is computed to fulfill the equation

P(|mu-mean|<epsilon)>=1-delta

by default, delta=0.05.

The module also has a function

mean_flattened

This function works in the same way as optimal_mean_estimator, but it flattens the arrays that it recieves
and also has no optional out parameter. Instead of a matrix, it returns single int, float or complex values.