Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

kdiether · 2014-05-31T17:55:39Z

In 0.13 I could pass a window length greater than the length of the Series passed to rolling_var (or, of course, rolling_std). In 0.14 that raises an error. Behavior is unchanged from 0.13 for other rolling functions:

data = """
x
0.1
0.5
0.3
0.2
0.7
"""

df = pd.read_csv(StringIO(data),header=True)

>>> pd.rolling_mean(df['x'],window=6,min_periods=2)

0      NaN
1    0.300
2    0.300
3    0.275
4    0.360
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=2)

0             NaN
1             NaN
2    3.903128e-15
3    7.528372e-01
4    6.013638e-01
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=6)

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64

Those work, but not rolling_var:

>>> pd.rolling_var(df['x'],window=6,min_periods=2)

Traceback (most recent call last):
  File "./foo.py", line 187, in <module>
    print pd.rolling_var(df['x'],window=6,min_periods=2)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 594, in f
    center=center, how=how, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 346, in _rolling_moment
    result = calc(values)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 340, in <lambda>
    **kwds)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 592, in call_cython
    return func(arg, window, minp, **kwds)
  File "algos.pyx", line 1177, in pandas.algos.roll_var (pandas/algos.c:28449)
IndexError: Out of bounds on buffer access (axis 0)

If this is the new desired default behavior for the rolling functions, I can work around it. I do like the behavior of rolling_skew and rolling_mean better. It was nice default behavior for me when I was doing rolling standard deviations for reasonably large financial data panels.

It looks to me like the issue is caused by the fact that the 0.14 algo for rolling variance is implemented such that the initial loop (roll_var (algos.pyx)) is the following:

for i from 0 <= i < win:

So it loops to win even when win > N.

It looks like to me that the other rolling functions try to implement their algos in such a way that the first loop counts over the following:

for i from 0 <= i < minp - 1:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.10-200.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.0
nose: 1.3.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-b52bc09
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.3
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Non

Karl D.

The text was updated successfully, but these errors were encountered:

jreback · 2014-05-31T19:48:11Z

this was changed in #6817 to provide numerical stability in rolling_var

cc @jamiefrio

prob just don't have all of the test cases
I don't think this shock have changed nor is it consistent

want to put together some tests for ranges of window lengths (0,less than len of array, equal, greater than Len of array) - so that it systematically tests these (for all of the rolling functions)

?

fix should be easy

kdiether · 2014-05-31T22:16:49Z

@jreback, sure I should be able to put together some tests.

jreback · 2014-06-14T13:16:57Z

@kdiether PR for this?

kdiether · 2014-06-14T15:10:27Z

@jreback Sorry, I've been particularly busy working on a paper. I should be able to get to it soon.

kdiether · 2014-06-17T22:00:38Z

@jreback,

So looking at the tests in test_moments.py, it seems to me I could capture these 'out of bounds' window lengths with something like the following:

    def _check_out_of_bounds(self, func):
        arr = np.repeat(np.nan,5)

        result = func(arr,6,min_periods=4)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=6)
        self.assertTrue(isnull(result).all())


    def test_rolling_sum_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_sum)

    def test_rolling_mean_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_mean)

    def test_rolling_var_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_var)

Would you be ok with a structure like that?

jreback · 2014-06-17T22:02:52Z

sure also check min_ periods of 8 and 0 just for kicks

kdiether · 2014-06-17T22:20:07Z

Got it. So something like the following:

    def _check_out_of_bounds(self, func):
        arr = np.repeat(np.nan,5)

        result = func(arr,6,min_periods=0)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=4)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=6)
        self.assertTrue(isnull(result).all())

        self.assertRaises(ValueError,func,arr,6,min_periods=8)


    def test_rolling_sum_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_sum)

    def test_rolling_mean_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_mean)

    def test_rolling_var_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_var)

In my pull request, do you want me to include tests for all the rolling functions or should I exclude rolling variance/stdev from the test for now?

jreback · 2014-06-17T22:43:33Z

this essentially a smoke test so can test everything

kdiether · 2014-06-18T04:03:53Z

It looks like to me like the default behavior for rolling_count is designed to be different than the other rolling functions because the NaNs from the rolling_sum call from within rolling_count are converted to zero counts (which makes sense for count ... at least to me).

result[np.isnan(result)] = 0

Should I exclude rolling_count from these smoke tests or carve out a special test for it?

jreback · 2014-06-18T12:27:21Z

@kdiether ideally, you would not use nan as your value (maybe use 1), of course you need different results then for the different cases. so prob a several cases here

jaimefrio · 2014-06-26T04:14:22Z

I missed this thread when it was originally raised, sorry about that. I have added #7572, that fixes the issue. Some comment on why I overlooked this can be found there.

I have added no specific test for this, hoping that @kdiether can finish what he has been working on. Let me know if you don't see yourself finishing it up any time soon, and I'll put something together in that other PR.

jreback · 2014-06-26T12:07:29Z

@kdiether did you do a pull-request for this?

kdiether · 2014-06-26T16:23:59Z

I didn't yet. Sorry, I'm really hammered by a project.

jreback · 2014-06-26T16:29:30Z

do you have a branch that is pushed (even if not-working/incomplete)?

kdiether · 2014-06-26T16:33:01Z

I don't have a pushed branch. The only thing I go to was that little code snippet above.

jaimefrio · 2014-06-26T17:57:26Z

If you are OK with it I'll grab your code and throw it into #7572.

kdiether · 2014-06-26T17:58:31Z

Yes, please do.

…s-dev#7297

BUG: Error in rolling_var if window is larger than array, fixes #7297

* commit 'v0.14.0-345-g8cd3dd6': (73 commits) PERF: allow slice indexers to be computed faster PERF: allow dst transition computations to be handled much faster if the end-points are ok (GH7633) Revert "Merge pull request pandas-dev#7591 from mcwitt/parse-index-cols-c" TST: fixes for 2.6 comparisons BUG: Error in rolling_var if window is larger than array, fixes pandas-dev#7297 REGR: Add back #N/A N/A as a default NA value (regresion from 0.12) (GH5521) BUG: xlim on plots with shared axes (GH2960, GH3490) BUG: Bug in Series.get with a boolean accessor (GH7407) DOC: add v0.15.0.txt template DOC: small doc build fixes DOC: v0.14.1 edits BUG: doc example in groupby.rst (GH7559 / GH7628) PERF: optimize MultiIndex.from_product for large iterables ENH: change BlockManager pickle format to work with dup items BUG: {expanding,rolling}_{cov,corr} don't handle arguments with different index sets properly CLN/DEPR: Fix instances of 'U'/'rU' in open(...) CLN: Fix typo TST: fix groupby test on windows (related GH7580) COMPAT: make numpy NaT comparison use a view to avoid implicit conversions BUG: Bug in to_timedelta that accepted invalid units and misinterpreted m/h (GH7611, GH6423) ...

jreback added Bug labels Jun 1, 2014

jreback added this to the 0.14.1 milestone Jun 1, 2014

jreback changed the title ~~Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error~~ Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error Jun 14, 2014

dsm054 mentioned this issue Jun 26, 2014

PERF: Poor numerical stability of rolling_kurt and rolling_skew #6929

Closed

jaimefrio mentioned this issue Jun 26, 2014

BUG: Error in rolling_var if window is larger than array, fixes #7297 #7572

Merged

seth-p mentioned this issue Jun 28, 2014

BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets #7512

Closed

jaimefrio added a commit to jaimefrio/pandas that referenced this issue Jul 2, 2014

BUG: Error in rolling_var if window is larger than array, fixes panda…

e090fad

…s-dev#7297

jreback closed this as completed in #7572 Jul 2, 2014

jreback added a commit that referenced this issue Jul 2, 2014

Merge pull request #7572 from jaimefrio/rolling_var_bug

45b1dcd

BUG: Error in rolling_var if window is larger than array, fixes #7297

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

kdiether commented May 31, 2014

jreback commented May 31, 2014

kdiether commented May 31, 2014

jreback commented Jun 14, 2014

kdiether commented Jun 14, 2014

kdiether commented Jun 17, 2014

jreback commented Jun 17, 2014

kdiether commented Jun 17, 2014

jreback commented Jun 17, 2014

kdiether commented Jun 18, 2014

jreback commented Jun 18, 2014

jaimefrio commented Jun 26, 2014

jreback commented Jun 26, 2014

kdiether commented Jun 26, 2014

jreback commented Jun 26, 2014

kdiether commented Jun 26, 2014

jaimefrio commented Jun 26, 2014

kdiether commented Jun 26, 2014

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

Comments

kdiether commented May 31, 2014

jreback commented May 31, 2014

kdiether commented May 31, 2014

jreback commented Jun 14, 2014

kdiether commented Jun 14, 2014

kdiether commented Jun 17, 2014

jreback commented Jun 17, 2014

kdiether commented Jun 17, 2014

jreback commented Jun 17, 2014

kdiether commented Jun 18, 2014

jreback commented Jun 18, 2014

jaimefrio commented Jun 26, 2014

jreback commented Jun 26, 2014

kdiether commented Jun 26, 2014

jreback commented Jun 26, 2014

kdiether commented Jun 26, 2014

jaimefrio commented Jun 26, 2014

kdiether commented Jun 26, 2014