Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

Closed
kdiether opened this issue May 31, 2014 · 17 comments · Fixed by #7572
Closed

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

kdiether opened this issue May 31, 2014 · 17 comments · Fixed by #7572
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Milestone

Comments

@kdiether
Copy link
Contributor

In 0.13 I could pass a window length greater than the length of the Series passed to rolling_var (or, of course, rolling_std). In 0.14 that raises an error. Behavior is unchanged from 0.13 for other rolling functions:

data = """
x
0.1
0.5
0.3
0.2
0.7
"""

df = pd.read_csv(StringIO(data),header=True)

>>> pd.rolling_mean(df['x'],window=6,min_periods=2)

0      NaN
1    0.300
2    0.300
3    0.275
4    0.360
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=2)

0             NaN
1             NaN
2    3.903128e-15
3    7.528372e-01
4    6.013638e-01
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=6)

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64

Those work, but not rolling_var:

>>> pd.rolling_var(df['x'],window=6,min_periods=2)

Traceback (most recent call last):
  File "./foo.py", line 187, in <module>
    print pd.rolling_var(df['x'],window=6,min_periods=2)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 594, in f
    center=center, how=how, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 346, in _rolling_moment
    result = calc(values)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 340, in <lambda>
    **kwds)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 592, in call_cython
    return func(arg, window, minp, **kwds)
  File "algos.pyx", line 1177, in pandas.algos.roll_var (pandas/algos.c:28449)
IndexError: Out of bounds on buffer access (axis 0)

If this is the new desired default behavior for the rolling functions, I can work around it. I do like the behavior of rolling_skew and rolling_mean better. It was nice default behavior for me when I was doing rolling standard deviations for reasonably large financial data panels.

It looks to me like the issue is caused by the fact that the 0.14 algo for rolling variance is implemented such that the initial loop (roll_var (algos.pyx)) is the following:

for i from 0 <= i < win:

So it loops to win even when win > N.

It looks like to me that the other rolling functions try to implement their algos in such a way that the first loop counts over the following:

for i from 0 <= i < minp - 1:
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.10-200.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.0
nose: 1.3.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-b52bc09
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.3
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Non

Karl D.

@jreback
Copy link
Contributor

jreback commented May 31, 2014

this was changed in #6817 to provide numerical stability in rolling_var

cc @jamiefrio

prob just don't have all of the test cases
I don't think this shock have changed nor is it consistent

want to put together some tests for ranges of window lengths (0,less than len of array, equal, greater than Len of array) - so that it systematically tests these (for all of the rolling functions)

?

fix should be easy

@kdiether
Copy link
Contributor Author

@jreback, sure I should be able to put together some tests.

@jreback jreback added this to the 0.14.1 milestone Jun 1, 2014
@jreback
Copy link
Contributor

jreback commented Jun 14, 2014

@kdiether PR for this?

@kdiether
Copy link
Contributor Author

@jreback Sorry, I've been particularly busy working on a paper. I should be able to get to it soon.

@jreback jreback changed the title Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error Jun 14, 2014
@kdiether
Copy link
Contributor Author

@jreback,

So looking at the tests in test_moments.py, it seems to me I could capture these 'out of bounds' window lengths with something like the following:

    def _check_out_of_bounds(self, func):
        arr = np.repeat(np.nan,5)

        result = func(arr,6,min_periods=4)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=6)
        self.assertTrue(isnull(result).all())


    def test_rolling_sum_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_sum)

    def test_rolling_mean_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_mean)

    def test_rolling_var_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_var)

Would you be ok with a structure like that?

@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

sure also check min_ periods of 8 and 0 just for kicks

@kdiether
Copy link
Contributor Author

Got it. So something like the following:

    def _check_out_of_bounds(self, func):
        arr = np.repeat(np.nan,5)

        result = func(arr,6,min_periods=0)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=4)
        self.assertTrue(isnull(result).all())

        result = func(arr,6,min_periods=6)
        self.assertTrue(isnull(result).all())

        self.assertRaises(ValueError,func,arr,6,min_periods=8)


    def test_rolling_sum_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_sum)

    def test_rolling_mean_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_mean)

    def test_rolling_var_out_of_bounds(self):
        self._check_out_of_bounds(mom.rolling_var)

In my pull request, do you want me to include tests for all the rolling functions or should I exclude rolling variance/stdev from the test for now?

@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

this essentially a smoke test so can test everything

@kdiether
Copy link
Contributor Author

It looks like to me like the default behavior for rolling_count is designed to be different than the other rolling functions because the NaNs from the rolling_sum call from within rolling_count are converted to zero counts (which makes sense for count ... at least to me).

result[np.isnan(result)] = 0

Should I exclude rolling_count from these smoke tests or carve out a special test for it?

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

@kdiether ideally, you would not use nan as your value (maybe use 1), of course you need different results then for the different cases. so prob a several cases here

@jaimefrio
Copy link
Contributor

I missed this thread when it was originally raised, sorry about that. I have added #7572, that fixes the issue. Some comment on why I overlooked this can be found there.

I have added no specific test for this, hoping that @kdiether can finish what he has been working on. Let me know if you don't see yourself finishing it up any time soon, and I'll put something together in that other PR.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

@kdiether did you do a pull-request for this?

@kdiether
Copy link
Contributor Author

I didn't yet. Sorry, I'm really hammered by a project.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2014

do you have a branch that is pushed (even if not-working/incomplete)?

@kdiether
Copy link
Contributor Author

I don't have a pushed branch. The only thing I go to was that little code snippet above.

@jaimefrio
Copy link
Contributor

If you are OK with it I'll grab your code and throw it into #7572.

@kdiether
Copy link
Contributor Author

Yes, please do.

jreback added a commit that referenced this issue Jul 2, 2014
BUG: Error in rolling_var if window is larger than array, fixes #7297
yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 15, 2014
* commit 'v0.14.0-345-g8cd3dd6': (73 commits)
  PERF: allow slice indexers to be computed faster
  PERF: allow dst transition computations to be handled much faster       if the end-points are ok (GH7633)
  Revert "Merge pull request pandas-dev#7591 from mcwitt/parse-index-cols-c"
  TST: fixes for 2.6 comparisons
  BUG: Error in rolling_var if window is larger than array, fixes pandas-dev#7297
  REGR: Add back #N/A N/A as a default NA value (regresion from 0.12) (GH5521)
  BUG: xlim on plots with shared axes (GH2960, GH3490)
  BUG: Bug in Series.get with a boolean accessor (GH7407)
  DOC: add v0.15.0.txt template
  DOC: small doc build fixes
  DOC: v0.14.1 edits
  BUG: doc example in groupby.rst (GH7559 / GH7628)
  PERF: optimize MultiIndex.from_product for large iterables
  ENH: change BlockManager pickle format to work with dup items
  BUG: {expanding,rolling}_{cov,corr} don't handle arguments with different index sets properly
  CLN/DEPR: Fix instances of 'U'/'rU' in open(...)
  CLN: Fix typo
  TST: fix groupby test on windows (related GH7580)
  COMPAT: make numpy NaT comparison use a view to avoid implicit conversions
  BUG: Bug in to_timedelta that accepted invalid units and misinterpreted m/h (GH7611, GH6423)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants