bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() #50

erg · 2012-09-04T18:37:35Z

https://gist.github.com/3624548

kwgoodman · 2012-09-04T19:42:41Z

I don't know what you mean by produces NaNs. Could you give a short example showing the input (x) and what you would like bottleneck to return?

erg · 2012-09-04T19:47:33Z

Test.pyx has the x vector in the gist link with exact instructions to reproduce the results. The results should look like the factor version or the pandas version without the nans. Closed on accident...

kwgoodman · 2012-09-04T20:35:26Z

Too much work to go through your example. Perhaps you can make a brief example as below.

Are these the NaNs you don't like? (I can't change that behavior)

In [8]: a = [1,2,10,12]
In [9]: bn.move_std(a, 4)
Out[9]: array([        nan,         nan,         nan,  4.81534007])

Do you want to normalize by N:

In [10]: bn.move_std(a, 4)[-1]
Out[10]: 4.815340071064556
In [11]: np.std(a)
Out[11]: 4.815340071064556

Or N-1?

In [14]: bn.move_std(a, 4, ddof=1)[-1]
Out[14]: 5.5602757725374259
In [15]: np.std(a, ddof=1)
Out[15]: 5.5602757725374259

erg · 2012-09-04T21:09:03Z

To reproduce it, just copy/paste this entire thing into the shell:

git clone git://gist.github.com/3624548.git && cd 3624548 && ipython2
import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)

erg · 2012-09-04T21:12:19Z

The preceding nans are not what I'm talking about--there are nans in the middle of the data. However, pandas has a kwarg num_periods=1 to start calculating even from the start. It would be awesome if bottleneck supported this feature, too.

import pandas


In [8]: pandas.rolling_mean(np.array([1,2,3,4,5]),3)
Out[8]: array([ nan,  nan,   2.,   3.,   4.])

In [9]: pandas.rolling_mean(np.array([1,2,3,4,5]),3, min_periods=1)
Out[9]: array([ 1. ,  1.5,  2. ,  3. ,  4. ])

erg · 2012-09-04T21:13:51Z

And finally, I'm aware of the sample standard deviation vs population standard deviation. I suppose that accounts for the difference in the non-nan values. Maybe the docs should say you're using the sample std?

Edit: I guess you're calculating the biased std, N=0. There's no description of ddof in the parameters list.

kwgoodman · 2012-09-04T22:29:58Z

Yes, that's a bug (good find!). The docstring should include the description of ddof, which we can copy from np.std:

ddof  int, optional
    Means Delta Degrees of Freedom.  The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

kwgoodman · 2012-09-04T22:35:38Z

I added ddof to the docstrings of move_std() and move_nanstd():

1abdfa7

kwgoodman · 2012-09-06T15:38:15Z

You might want to try bn.move_nanstd() instead of bn.move_std().

erg · 2012-09-06T15:42:17Z

Same bug.

In [5]: bn.move_nanstd(x,3)
Out[5]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,
         2.27731653e-04,   2.27731653e-04,   4.55947887e-04,
         4.05285108e-04,   2.77122055e-04,   1.82255759e-04,
...

kwgoodman · 2012-09-06T15:47:50Z

If there are three nans in a row in x then what should the std be when the window is 3?

erg · 2012-09-06T15:51:20Z

There are no nans in my x at all. But you need the full precision, hence loading it from the file as described above.

In [7]: print x
[  0.00000000e+00   1.90748689e-04  -1.90675932e-04   1.14481963e-03
   1.14481963e-03   2.87186699e-04   2.87186699e-04   2.87186699e-04
   2.87186699e-04   8.99839353e-07   8.99839353e-07   5.72982219e-04
   5.72982219e-04   5.72982219e-04   5.72982219e-04  -1.90158103e-04
  -1.90158103e-04  -9.44368435e-05  -1.43440622e-03  -1.43440622e-03
  -1.43440622e-03  -9.54927888e-04  -9.54927888e-04  -9.54927888e-04
  -3.78337459e-04   2.93963731e-04   1.97985231e-04   1.97985231e-04
   1.02052767e-04   1.02052767e-04  -6.65259721e-04  -6.65259721e-04
  -4.72989004e-04  -4.72989004e-04  -4.72989004e-04   4.90216544e-04
   4.90216544e-04   1.06791730e-03   1.06791730e-03   1.06791730e-03
   1.45297698e-03   1.45297698e-03   1.45297698e-03   1.83829631e-03
   1.83829631e-03   2.70618445e-03   2.70618445e-03   2.70618445e-03
   2.22309266e-03   2.22309266e-03   3.19030419e-03   2.90042111e-03
   2.51379826e-03   2.51379826e-03   3.86710904e-03   4.25324670e-03
   4.73573247e-03   4.73573247e-03   4.73573247e-03   4.73573247e-03
   4.73573247e-03   4.35022746e-03   4.35022746e-03   4.35022746e-03
   4.35022746e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03
   3.96479674e-03   3.96479674e-03   3.96479674e-03   3.96479674e-03

erg · 2012-09-06T15:52:23Z

If there were nans in the x input, I suppose I don't care what move_std returns at all. :)

erg · 2012-09-06T16:02:48Z

It might be that you're taking the sqrt of a negative number.

From move_std.pyx:


            y[i0] = sqrt((a2sum - asum * asum / count) / (count - ddof))

erg · 2012-09-06T16:17:45Z

Yes, it's taking the sqrt of negative numbers!

In [2]: import numpy as np
import bottleneck as bn

x = np.load('testx.npy')
bn.move_std(x,3)
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000024742859987833254792788557907545765601753374, 0.00086156009635718323012854025222395648597739636898, 3, 0
-0.00000000000000000000012352563814125214061742603683
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000098492587061058721337463046918703213350454462, 0.00171894665764582782810365735315372148761525750160, 3, 0
-0.00000000000000000000014117215787571671534142474463
0.00000617256357447203218380502726114755773778597359, -0.00430321864694510961002471560732374200597405433655, 3, 0
-0.00000000000000000000028234431575143343068284948926
0.00000067115579297346920246844980892375609471400821, -0.00141896701121640295140124976569495629519224166870, 3, 0
-0.00000000000000000000042351647362715016953416125034
0.00000342134204903411354143196246302416341222851770, 0.00320375188600839633235040082581690512597560882568, 3, 0
-0.00000000000000000000183523805238431727592863466376
0.00000633342630963094281159458942553897031757514924, 0.00435893093876157239341395666087919380515813827515, 3, 0
-0.00000000000000000000141172157875716729447356954499
Out[2]: 
array([             nan,              nan,   1.55715950e-04,
         5.61672553e-04,   6.29558646e-04,   4.04292042e-04,
         4.04292042e-04,              nan,              nan,
         1.34956920e-04,   1.34956920e-04,   2.69682220e-04,
         2.69682220e-04,              nan,              nan,
         3.59747798e-04,   3.59747798e-04,   4.51234346e-05,
         6.10358168e-04,   6.31667620e-04,              nan,
         2.26028251e-04,   2.26028251e-04,   0.00000000e+00,
         2.71807335e-04,   5.10356704e-04,   2.96900396e-04,
         4.52446992e-05,   4.52229970e-05,   4.52229970e-05,
         3.61714576e-04,   3.61714576e-04,   9.06372853e-05,
         9.06372853e-05,              nan,   4.54059450e-04,
         4.54059450e-04,   2.72330746e-04,   2.72330746e-04,
                    nan,   1.81518876e-04,   1.81518876e-04,
                    nan,   1.81641276e-04,   1.81641276e-04,
         4.09126392e-04,   4.09126392e-04,   3.36062087e-11,

Here's the print statements I added:

        if count == window:
            foo = (a2sum - asum * asum / count) / (count - ddof)
            if foo < 0.:
                print "%.50f, %.50f, %d, %d" % (a2sum, asum, count, ddof)
                print "%.50f" % foo
            y[i0] = sqrt(foo)
        else:
            y[i0] = NAN

erg · 2012-09-06T16:29:53Z

Note that these negative numbers are also less than negative zero, or else sqrt could handle them.

kwgoodman · 2012-09-06T18:01:42Z

Yep, we found the same thing. We plan to set the output to 0 when a2sum - asum * asum / count is negative.

erg · 2012-09-06T18:02:55Z

Awesome. What about adding a move_var and min_periods=1 functionality?

erg · 2012-09-06T18:03:29Z

Pandas also has rolling_kurtosis and rolling_skew.

BUG move_std and move_nanstd neg sqrt bugs fixed. As reported in issue #50.

kwgoodman · 2012-09-06T19:26:06Z

The negative sqrt issue is fixed:

6fb6f73

kwgoodman · 2012-09-06T19:28:18Z

docstring is fixed and negative sqrt bug is fixed, so I'm closing this issue. You can open a new issue with the min_periods feature request if you like.

erg closed this as completed Sep 4, 2012

erg reopened this Sep 4, 2012

kwgoodman added a commit that referenced this issue Sep 6, 2012

Merge pull request #51 from jmcloughlin/dev

6fb6f73

BUG move_std and move_nanstd neg sqrt bugs fixed. As reported in issue #50.

kwgoodman closed this as completed Sep 6, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() #50

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() #50

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

erg commented Sep 4, 2012

erg commented Sep 4, 2012

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() #50

bottleneck.move_std() produces nans, doesn't match pandas.rolling_std() #50

Comments

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

erg commented Sep 4, 2012

erg commented Sep 4, 2012

erg commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

kwgoodman commented Sep 4, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

erg commented Sep 6, 2012

erg commented Sep 6, 2012

kwgoodman commented Sep 6, 2012

kwgoodman commented Sep 6, 2012