Skip to content

Releases: bschulz81/optimal_mean_estimator

2.0.0

07 Jul 18:55
aa80d60
Compare
Choose a tag to compare

the functions now detect whether a distribution is skewed and by default then return the ordinary mean.
Additionally, they have parameters where the user can supply own correction functions if he knows the distribution shape.

v1.0.1

27 Jun 17:04
19c3154
Compare
Choose a tag to compare

The algorithm turned out to be very sensitive to even partitioning.

This is a bugfix release that makes the confidence interval smaller if necessary in order to to ensure even partitioning in all cases.

As a result, the algorithm works better for skewed distributions. Unfortunately, it is still sometimes wrong for the exponential distribution.

Release of optimal mean estimator

24 Jun 08:29
5a5b834
Compare
Choose a tag to compare

#Description:

This module is a python implementation of the optimal subgaussian mean estimator
from

J. C. H. Lee and P. Valiant, "Optimal Sub-Gaussian Mean Estimation in R"
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS),
Denver, CO, USA, 2022, pp. 672-683, doi: 10.1109/FOCS52979.2021.00071.
https://arxiv.org/abs/2011.08384

As the provided examples show, the estimator is superior to the ordinary population mean
for symmetric heavily tailed distributions, like the T or the Laplace distributions.
In 1000 trials, the optimal estimator is better roughly 662 trial runs
and the population mean is better in just 338 trials.

For standard normal distributions, a few executions of the last example show that the estimator
is somewhat equal to the population mean (within the supplied confidence interval delta and
apart from numerical floating point imprecisions). In 1000 trial runs, the population mean is
sometimes slightly better in 516 runs and in 484 cases the optimal estimator is better.

The implementation consists of a function

mean

that computes the optimal mean estimator for numpy arrays.

it expects a numpy array a, a confidence parameter delta and its other
arguments match the behavior of the numpy.mean function whose documentation is
given here:
https://numpy.org/doc/stable/reference/generated/numpy.mean.html

The computed mean estimator is by default a numpy f64 value if the array is of
integer type. Otherwise, the result is of the same type as the array, which is
also similar as the numpy mean function.

The estimator is computed to fulfill the equation

P(|mu-mean|<epsilon)>=1-delta

by default, delta=0.05.

The module also has a function

mean_flattened

This function works in the same way as optimal_mean_estimator, but it flattens the arrays that it recieves
and also has no optional out parameter. Instead of a matrix, it returns single int, float or complex values.