-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use Kahan summation and Welfords method to calculate rolling var and std #37055
Conversation
great thanks @phofl |
Unfortunately I chose a bad example, which did miss one bug. Fixed it now and added the corresponding test. |
xref #6817 i guess this was there a long time ago, but i dont' think had enough tests to lock it down. |
and maybe some examples from here: #6929 (though that's obviously a separate issue) |
Yes planned to Look into this in the future. Maybe we can improve this in a similar way. Delta**2 was the problem with the modified version. Switching to regular welford fixes this |
Looks like this will only help with large numbers. |
Interestingly BENCHMARKS NOT SIGNIFICANTLY CHANGED. |
can you merge master, ping on green |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/window/test_rolling.py
@jreback green |
@@ -353,7 +362,8 @@ def roll_var(ndarray[float64_t] values, ndarray[int64_t] start, | |||
Numerically stable implementation using Welford's method. | |||
""" | |||
cdef: | |||
float64_t mean_x = 0, ssqdm_x = 0, nobs = 0, | |||
float64_t mean_x = 0, ssqdm_x = 0, nobs = 0, compensation_add = 0, | |||
float64_t compensation_remove = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there a line in this func that you need to remove
eg the < 1e-14
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
@phofl this PR doesn't the close issue? can u show an example of when |
Yeah thought so too initially, because I was not able to construct a counter example. But our docstrings do the job:
Returns
Seems like Kahan summation and Welfords method only help for large numbers. Issues with numbers like |
ok its prob worth adding an xfail test for that one. (followon ok) |
@@ -192,6 +192,7 @@ Other enhancements | |||
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`) | |||
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`) | |||
- Added :meth:`Rolling.sem()` and :meth:`Expanding.sem()` to compute the standard error of mean (:issue:`26476`). | |||
- :meth:`Rolling.var()` and :meth:`Rolling.std()` use Kahan summation and Welfords Method to avoid numerical issues (:issue:`37051`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not fully true, but its better so ok.
thanks @phofl |
@jreback with the line < 1e-14 this test would not fail. I could add a test which passes, but would fail, if somebody removes the line without fixing the underlying problem? |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
As suggested by @mroeschke Kahan summation fixes the numerical problems. Additionally I used Welfords Method to calculate
ssqdm
, because previously the tests I have added would returnfor
var()
. I am running the asv and will post the results when available