-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297
Comments
this was changed in #6817 to provide numerical stability in rolling_var cc @jamiefrio prob just don't have all of the test cases want to put together some tests for ranges of window lengths (0,less than len of array, equal, greater than Len of array) - so that it systematically tests these (for all of the rolling functions) ? fix should be easy |
@jreback, sure I should be able to put together some tests. |
@kdiether PR for this? |
@jreback Sorry, I've been particularly busy working on a paper. I should be able to get to it soon. |
rolling_var
when win > len(arr)
for 0.14: now raises error
So looking at the tests in def _check_out_of_bounds(self, func):
arr = np.repeat(np.nan,5)
result = func(arr,6,min_periods=4)
self.assertTrue(isnull(result).all())
result = func(arr,6,min_periods=6)
self.assertTrue(isnull(result).all())
def test_rolling_sum_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_sum)
def test_rolling_mean_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_mean)
def test_rolling_var_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_var) Would you be ok with a structure like that? |
sure also check min_ periods of 8 and 0 just for kicks |
Got it. So something like the following: def _check_out_of_bounds(self, func):
arr = np.repeat(np.nan,5)
result = func(arr,6,min_periods=0)
self.assertTrue(isnull(result).all())
result = func(arr,6,min_periods=4)
self.assertTrue(isnull(result).all())
result = func(arr,6,min_periods=6)
self.assertTrue(isnull(result).all())
self.assertRaises(ValueError,func,arr,6,min_periods=8)
def test_rolling_sum_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_sum)
def test_rolling_mean_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_mean)
def test_rolling_var_out_of_bounds(self):
self._check_out_of_bounds(mom.rolling_var) In my pull request, do you want me to include tests for all the rolling functions or should I exclude rolling variance/stdev from the test for now? |
this essentially a smoke test so can test everything |
It looks like to me like the default behavior for result[np.isnan(result)] = 0 Should I exclude |
@kdiether ideally, you would not use |
I missed this thread when it was originally raised, sorry about that. I have added #7572, that fixes the issue. Some comment on why I overlooked this can be found there. I have added no specific test for this, hoping that @kdiether can finish what he has been working on. Let me know if you don't see yourself finishing it up any time soon, and I'll put something together in that other PR. |
@kdiether did you do a pull-request for this? |
I didn't yet. Sorry, I'm really hammered by a project. |
do you have a branch that is pushed (even if not-working/incomplete)? |
I don't have a pushed branch. The only thing I go to was that little code snippet above. |
If you are OK with it I'll grab your code and throw it into #7572. |
Yes, please do. |
BUG: Error in rolling_var if window is larger than array, fixes #7297
* commit 'v0.14.0-345-g8cd3dd6': (73 commits) PERF: allow slice indexers to be computed faster PERF: allow dst transition computations to be handled much faster if the end-points are ok (GH7633) Revert "Merge pull request pandas-dev#7591 from mcwitt/parse-index-cols-c" TST: fixes for 2.6 comparisons BUG: Error in rolling_var if window is larger than array, fixes pandas-dev#7297 REGR: Add back #N/A N/A as a default NA value (regresion from 0.12) (GH5521) BUG: xlim on plots with shared axes (GH2960, GH3490) BUG: Bug in Series.get with a boolean accessor (GH7407) DOC: add v0.15.0.txt template DOC: small doc build fixes DOC: v0.14.1 edits BUG: doc example in groupby.rst (GH7559 / GH7628) PERF: optimize MultiIndex.from_product for large iterables ENH: change BlockManager pickle format to work with dup items BUG: {expanding,rolling}_{cov,corr} don't handle arguments with different index sets properly CLN/DEPR: Fix instances of 'U'/'rU' in open(...) CLN: Fix typo TST: fix groupby test on windows (related GH7580) COMPAT: make numpy NaT comparison use a view to avoid implicit conversions BUG: Bug in to_timedelta that accepted invalid units and misinterpreted m/h (GH7611, GH6423) ...
In 0.13 I could pass a window length greater than the length of the
Series
passed torolling_var
(or, of course,rolling_std
). In 0.14 that raises an error. Behavior is unchanged from 0.13 for other rolling functions:Those work, but not
rolling_var
:If this is the new desired default behavior for the rolling functions, I can work around it. I do like the behavior of
rolling_skew
androlling_mean
better. It was nice default behavior for me when I was doing rolling standard deviations for reasonably large financial data panels.It looks to me like the issue is caused by the fact that the 0.14 algo for rolling variance is implemented such that the initial loop (
roll_var
(algos.pyx)) is the following:So it loops to
win
even whenwin > N
.It looks like to me that the other rolling functions try to implement their algos in such a way that the first loop counts over the following:
Karl D.
The text was updated successfully, but these errors were encountered: