negative variances #1090

turkeytest · 2012-04-20T15:42:53Z

Hello,

It seems possible to have negative variances due to numerical inaccuracies. This is because nanops.py, line 120 does not take the absolute value of the result. Having negative values will cause std() to return NaN when it should be 0.

The code below should [probabilistically] recreate the problem. It could also be turned into a unit test.

Thanks!

from pandas import DataFrame
import numpy as np

random_repeated_rows = np.array( [np.random.random((10000,)),] * 10 )
my_var = DataFrame( random_repeated_rows ).var()

len( my_var[ my_var < 0 ] ) # returns a negative slightly less than half of the time
np.min( DataFrame( random_repeated_rows ).var() ) # returns a tiny negative -9.8686491077791697e-16
np.min( DataFrame( random_repeated_rows ).values.var(axis=0) ) # returns 0

wesm · 2012-04-22T03:58:57Z

Merged in master. Thanks @changhiskhan

changhiskhan mentioned this issue Apr 20, 2012

BUG: _nanvar may return small negative if given a constant array #1091

Closed

wesm closed this as completed Apr 22, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

negative variances #1090

negative variances #1090

turkeytest commented Apr 20, 2012

wesm commented Apr 22, 2012

negative variances #1090

negative variances #1090

Comments

turkeytest commented Apr 20, 2012

wesm commented Apr 22, 2012