DOC/PERF: Decide how to handle floating point artifacts during rolling calculations #37051

mroeschke · 2020-10-11T07:38:15Z

Currently we have a check here that artificially handles a numerical precision issue in rolling.var and rolling.std where our rolling variance calculation is carrying forward floating point artifacts. Ideally we should be using a more numerically stable algorithm (maybe Kahan summation) so this check isn't so arbitrary.

pandas/pandas/_libs/window/aggregations.pyx

Line 305 in 601eff1

if result < 1e-15:

The text was updated successfully, but these errors were encountered:

ukarroum · 2020-10-11T15:27:14Z

Would like to try working on that is possible.

ukarroum · 2020-10-11T15:27:19Z

take

ukarroum · 2020-10-12T17:44:07Z

@phofl : It looks like you have a working PR : #37055
so i m gonna unassign myself.

phofl · 2020-10-12T17:55:41Z

@ukarroum Not really, my PR fixes problems with large numbers but not the problem mentioned above

ukarroum · 2020-10-12T17:57:46Z

Oh my bad.

Gonna retake it then.

Thanks

ukarroum · 2020-10-12T17:57:52Z

take

ukarroum · 2020-10-25T18:45:43Z

It looks like (from PR : #37055) using kahan summations don't solve the issue.
couldn't find another way, so i'm just gonna unassign myself.

phofl · 2021-01-04T03:30:50Z

To summarize the current situation:

Theoretically our implementation is stable for small numbers.

Our implementation is not stable for cases like:

s = pd.Series([7, 5, 5, 5])
print(s.rolling(3).var())

The following explains why:

We are using Welfords Method (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance) with Kahan summation. In the third add pass through we have the following values: The current ssqdm is 2.0

prev_mean = 6.0
new_mean = 5 + 2/3
val=6.0
ssqdm=2 + 2/3

ssqm is 2.666666666666667

The next pass through removes the following:

prev_mean = 5.666666666666667
new_mean = 5.0
val=7.0
ssqdm=2.666666666666667 - 2 - 2/3

Theoretically this should lead to 0, but because of floating point artifacts this leads to 8.881784197001252e-16. So without the line in the op, we would not return 0 here. That is the reason why this check is needed.
The implementation can be found at

pandas/pandas/_libs/window/aggregations.pyx

Line 290 in 125441c

cdef inline void add_var(float64_t val, float64_t *nobs, float64_t *mean_x,

result:

0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64

mroeschke · 2021-01-04T17:24:31Z

Thanks for the clear explanation @phofl.

Since these floating point artifacts are unavoidable, we can either:

Just document that we round values less than 1e-15 to 0 due to floating point artifacts in our user_guide/window.rst
Actually remove our artificial if result < 1e-15: and let floating point artifacts be apart of our implementation and document that.

phofl · 2021-01-04T18:26:55Z

First one? Don't know. Both have their disadvantages unfortunately...

mroeschke · 2021-01-04T19:46:57Z

Yeah I can see that.

I am also entertaining the second option as well and pushing the responsibility of handing floating point artifacts to the user (in the final result but unfortunately not during the rolling calculation)

phofl · 2021-01-04T20:01:54Z

It is pick your poison. In case of the second alternative, we have to adjust the docstring in

pandas/pandas/core/window/rolling.py

Line 641 in 7f2a768

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])

and

pandas/pandas/core/window/rolling.py

Line 701 in 7f2a768

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])

My example was based on that. This would cause doctests to fail otherwise

xmatthias · 2021-02-25T10:56:38Z

crossposting from #39872 (comment) as i'm not sure that issue is followed any longer.

We are encountering a problem while calculating mean (and std) on top of Crypto asset prices (which can become very low numbers (1e-7)).

The release-logs for pandas 1.2 mention this change, however there's no mention of this side-effect to low value numbers.

I don't think the below example should be impacted by this - as the expected results are 1e-9 - so nowhere near the mentioned threshold of 1e-15.

A very simple example:

import pandas as pd

print(pd.__version__)

df = pd.DataFrame(data = {'data':
    [
        0.00000054,
        0.00000053,
        0.00000054,
     ]}
    )

df['mean'] = df['data'].rolling(2).mean()
df['std'] = df['data'].rolling(2).std()
print(df)

with pandas < 1.2.0, the return is as follows:

1.1.5
           data          mean           std
0  5.400000e-07           NaN           NaN
1  5.300000e-07  5.350000e-07  7.071068e-09
2  5.400000e-07  5.350000e-07  7.071068e-09

while 1.2.0 returns:

1.2.0
           data          mean  std
0  5.400000e-07           NaN  NaN
1  5.300000e-07  5.350000e-07  0.0
2  5.400000e-07  5.350000e-07  0.0

The values are nowhere near the mentioned threshold of 1e-15.

phofl · 2021-02-25T11:07:28Z

The relevant result is the variance, which is used to calculate the std. The variance is e-17, meaning the threshold is met

Edit: This is also explained here: #39872 (comment)

xmatthias · 2021-02-25T13:43:19Z

you're right, it's the variance that's this low (did miss that part) - however the relevant part from a user perspective is the endresult - which is std in this case - so the final error i receive from pandas is 5e-07 - not the variance - even though the intermediate result is wrong by only 1e-17.

I do still see this as a regression / bug in Pandas - as the version update from 1.1.5 to 1.2.0 broke the result of a calculation that was correct beforehand.

bashtage · 2021-03-17T08:50:47Z

I do still see this as a regression / bug in Pandas - as the version update from 1.1.5 to 1.2.0 broke the result of a calculation that was correct beforehand.

a hard threshold definitely seems like a bug. It seems that it has to be the case that df.rolling(3).var() is the same as 10**10(df/(10**10)).rolling(3).var() up to some rounding. The threshold should be relative to the previous value I would think (or no threshold at all, which is what NumPy does).

mroeschke added Performance Memory or execution speed performance Window rolling, ewma, expanding labels Oct 11, 2020

github-actions bot assigned ukarroum Oct 11, 2020

phofl mentioned this issue Oct 11, 2020

ENH: Use Kahan summation and Welfords method to calculate rolling var and std #37055

Merged

5 tasks

jreback added this to the 1.2 milestone Oct 11, 2020

mroeschke mentioned this issue Oct 12, 2020

PERF: ExpandingGroupby #37064

Merged

4 tasks

ukarroum removed their assignment Oct 12, 2020

github-actions bot assigned ukarroum Oct 12, 2020

ukarroum removed their assignment Oct 25, 2020

jreback modified the milestones: 1.2, Contributions Welcome Nov 19, 2020

mroeschke mentioned this issue Dec 15, 2020

BUG: wrong result in rolling mean\sum (possibly related with cython implementation) #38488

Closed

3 tasks

phofl mentioned this issue Jan 1, 2021

REGR: incorrect results with std on rolling window since 1.2.0 #38874

Closed

3 tasks

mroeschke changed the title ~~ENH: Implement a more numerically stable algorithm for rolling var~~ ENH: Implement a more numerically stable algorithm for rolling var for small values Jan 2, 2021

mroeschke modified the milestones: Contributions Welcome, 1.2.1 Jan 2, 2021

jreback modified the milestones: 1.2.1, 1.3 Jan 4, 2021

mroeschke removed the Performance Memory or execution speed performance label Jan 4, 2021

mroeschke added Docs Needs Discussion Requires discussion from core team before further action labels Jan 4, 2021

mroeschke changed the title ~~ENH: Implement a more numerically stable algorithm for rolling var for small values~~ DOC/PERF: Decide how to handle floating point artifacts during rolling calculations Jan 4, 2021

mroeschke mentioned this issue Feb 17, 2021

BUG: pandas.core.window.rolling.Rolling.std gives all-zero output for small numbers #39872

Closed

3 tasks

JoschD mentioned this issue Feb 22, 2021

Rolling problem when calculating std for low values pylhc/omc3#271

Closed

xmatthias mentioned this issue Feb 25, 2021

Buy Signal not meet criteria after updating to 2021.1 release due to pandas upgrade freqtrade/freqtrade#4382

Closed

mroeschke mentioned this issue Mar 16, 2021

BUG: Series rolling standard deviation gives zero for small numbers #40448

Closed

3 tasks

mgorny mentioned this issue Mar 17, 2021

COMPAT/BLD: rolling failed on Arm64 and ppc64le Linux #38921

Open

mroeschke mentioned this issue Mar 18, 2021

BUG: Remove artificial precision limit in rolling var & std #40505

Merged

4 tasks

jreback closed this as completed in #40505 Mar 21, 2021

isVoid mentioned this issue Sep 9, 2021

Add python bindings to fixed-size window and groupby rolling.var, rolling.std rapidsai/cudf#9097

Merged

juanmpga mentioned this issue Feb 18, 2022

BUG: Pandas rolling std precision error #46049

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC/PERF: Decide how to handle floating point artifacts during rolling calculations #37051

DOC/PERF: Decide how to handle floating point artifacts during rolling calculations #37051

mroeschke commented Oct 11, 2020

ukarroum commented Oct 11, 2020

ukarroum commented Oct 11, 2020

ukarroum commented Oct 12, 2020

phofl commented Oct 12, 2020

ukarroum commented Oct 12, 2020

ukarroum commented Oct 12, 2020

ukarroum commented Oct 25, 2020

phofl commented Jan 4, 2021

mroeschke commented Jan 4, 2021

phofl commented Jan 4, 2021

mroeschke commented Jan 4, 2021

phofl commented Jan 4, 2021

xmatthias commented Feb 25, 2021 •

edited

Loading

phofl commented Feb 25, 2021 •

edited

Loading

xmatthias commented Feb 25, 2021 •

edited

Loading

bashtage commented Mar 17, 2021

DOC/PERF: Decide how to handle floating point artifacts during rolling calculations #37051

DOC/PERF: Decide how to handle floating point artifacts during rolling calculations #37051

Comments

mroeschke commented Oct 11, 2020

ukarroum commented Oct 11, 2020

ukarroum commented Oct 11, 2020

ukarroum commented Oct 12, 2020

phofl commented Oct 12, 2020

ukarroum commented Oct 12, 2020

ukarroum commented Oct 12, 2020

ukarroum commented Oct 25, 2020

phofl commented Jan 4, 2021

mroeschke commented Jan 4, 2021

phofl commented Jan 4, 2021

mroeschke commented Jan 4, 2021

phofl commented Jan 4, 2021

xmatthias commented Feb 25, 2021 • edited Loading

phofl commented Feb 25, 2021 • edited Loading

xmatthias commented Feb 25, 2021 • edited Loading

bashtage commented Mar 17, 2021

xmatthias commented Feb 25, 2021 •

edited

Loading

phofl commented Feb 25, 2021 •

edited

Loading

xmatthias commented Feb 25, 2021 •

edited

Loading