#Description:
This module is a python implementation of the optimal subgaussian mean estimator
from
J. C. H. Lee and P. Valiant, "Optimal Sub-Gaussian Mean Estimation in R"
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS),
Denver, CO, USA, 2022, pp. 672-683, doi: 10.1109/FOCS52979.2021.00071.
https://arxiv.org/abs/2011.08384
As the provided examples show, the estimator is superior to the ordinary population mean
for symmetric heavily tailed distributions, like the T or the Laplace distributions.
In 1000 trials, the optimal estimator is better roughly 662 trial runs
and the population mean is better in just 338 trials.
For standard normal distributions, a few executions of the last example show that the estimator
is somewhat equal to the population mean (within the supplied confidence interval delta and
apart from numerical floating point imprecisions). In 1000 trial runs, the population mean is
sometimes slightly better in 516 runs and in 484 cases the optimal estimator is better.
The implementation consists of a function
mean
that computes the optimal mean estimator for numpy arrays.
it expects a numpy array a, a confidence parameter delta and its other
arguments match the behavior of the numpy.mean function whose documentation is
given here:
https://numpy.org/doc/stable/reference/generated/numpy.mean.html
The computed mean estimator is by default a numpy f64 value if the array is of
integer type. Otherwise, the result is of the same type as the array, which is
also similar as the numpy mean function.
The estimator is computed to fulfill the equation
P(|mu-mean|<epsilon)>=1-delta
by default, delta=0.05.
The module also has a function
mean_flattened
This function works in the same way as optimal_mean_estimator, but it flattens the arrays that it recieves
and also has no optional out parameter. Instead of a matrix, it returns single int, float or complex values.