Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling_mean and rolling_sum produce negative output from positive input #2114

Closed
mrjbq7 opened this issue Oct 24, 2012 · 2 comments
Closed
Assignees
Labels
Milestone

Comments

@mrjbq7
Copy link

mrjbq7 commented Oct 24, 2012

I have an array of non-negative numbers, that when used with rolling_sum or rolling_mean produce an output array that has a small negative number in it.

The test looks like this:

import numpy as np
import pandas

data = np.load('data.npy')
assert all(data >= 0)

sums = pandas.rolling_sum(data, 2, min_periods=1)
zero = np.where(sums < 0)[0]
assert len(zero) == 0, zero

mean = pandas.rolling_mean(data, 2, min_periods=1)
zero = np.where(mean < 0)[0]
assert len(zero) == 0, zero

It requires a small binary array to reproduce, because of the floating point numbers (so I created a gist: https://gist.github.com/3948013).

You can run the test case:

$ git clone git://gist.github.com/3948013.git
$ cd 3948013
$ python test.py

I made sure this bug affects the most current version of Pandas:

>>> import pandas
>>> pandas.__version__
'0.9.1.dev-8cd93d3'

>>> import numpy
>>> numpy.__version__
'1.7.0b2'
@mrjbq7
Copy link
Author

mrjbq7 commented Oct 24, 2012

It looks like bottleneck doesn't have this problem...

@mrjbq7
Copy link
Author

mrjbq7 commented Oct 24, 2012

I used a similar approach to bottleneck where the new value is added first before the previous value is subtracted and it fixes this particular bug...

diff --git a/pandas/src/moments.pyx b/pandas/src/moments.pyx
index 503a63c..9b5b621 100644
--- a/pandas/src/moments.pyx
+++ b/pandas/src/moments.pyx
@@ -175,16 +175,16 @@ def roll_sum(ndarray[double_t] input, int win, int minp):
     for i from minp - 1 <= i < N:
         val = input[i]

+        if val == val:
+            nobs += 1
+            sum_x += val
+
         if i > win - 1:
             prev = input[i - win]
             if prev == prev:
                 sum_x -= prev
                 nobs -= 1

-        if val == val:
-            nobs += 1
-            sum_x += val
-
         if nobs >= minp:
             output[i] = sum_x
         else:
@@ -218,16 +218,16 @@ def roll_mean(ndarray[double_t] input,
     for i from minp - 1 <= i < N:
         val = input[i]

+        if val == val:
+            nobs += 1
+            sum_x += val
+
         if i > win - 1:
             prev = input[i - win]
             if prev == prev:
                 sum_x -= prev
                 nobs -= 1

-        if val == val:
-            nobs += 1
-            sum_x += val
-
         if nobs >= minp:
             output[i] = sum_x / nobs
         else:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants