Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Remove artificial precision limit in rolling var & std #40505

Merged
merged 2 commits into from
Mar 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,24 @@ cast to ``dtype=object`` (:issue:`38709`)
ser
ser2


.. _whatsnew_130.notable_bug_fixes.rolling_var_precision:

Removed artificial truncation in rolling variance and standard deviation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`core.window.Rolling.std` and :meth:`core.window.Rolling.var` will no longer
artificially truncate results that are less than ``~1e-8`` and ``~1e-15`` respectively to
zero (:issue:`37051`, :issue:`40448`, :issue:`39872`).

However, floating point artifacts may now exist in the results when rolling over larger values.

.. ipython:: python

s = pd.Series([7, 5, 5, 5])
s.rolling(3).var()


.. _whatsnew_130.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down
4 changes: 0 additions & 4 deletions pandas/_libs/window/aggregations.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -283,10 +283,6 @@ cdef inline float64_t calc_var(int64_t minp, int ddof, float64_t nobs,
result = 0
else:
result = ssqdm_x / (nobs - <float64_t>ddof)
# Fix for numerical imprecision.
# Can be result < 0 once Kahan Summation is implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any risk of getting negative variance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so, no.

if result < 1e-14:
result = 0
else:
result = NaN

Expand Down
38 changes: 22 additions & 16 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1882,21 +1882,24 @@ def median(
The default ``ddof`` of 1 used in :meth:`Series.std` is different
than the default ``ddof`` of 0 in :func:`numpy.std`.

A minimum of one period is required for the rolling calculation.\n
A minimum of one period is required for the rolling calculation.

The implementation is susceptible to floating point imprecision as
shown in the example below.\n
"""
).replace("\n", "", 1),
create_section_header("Examples"),
dedent(
"""
>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).std()
0 NaN
1 NaN
2 0.577350
3 1.000000
4 1.000000
5 1.154701
6 0.000000
0 NaN
1 NaN
2 5.773503e-01
3 1.000000e+00
4 1.000000e+00
5 1.154701e+00
6 2.580957e-08
dtype: float64
"""
).replace("\n", "", 1),
Expand Down Expand Up @@ -1931,21 +1934,24 @@ def std(self, ddof: int = 1, *args, **kwargs):
The default ``ddof`` of 1 used in :meth:`Series.var` is different
than the default ``ddof`` of 0 in :func:`numpy.var`.

A minimum of one period is required for the rolling calculation.\n
A minimum of one period is required for the rolling calculation.

The implementation is susceptible to floating point imprecision as
shown in the example below.\n
"""
).replace("\n", "", 1),
create_section_header("Examples"),
dedent(
"""
>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).var()
0 NaN
1 NaN
2 0.333333
3 1.000000
4 1.000000
5 1.333333
6 0.000000
0 NaN
1 NaN
2 3.333333e-01
3 1.000000e+00
4 1.000000e+00
5 1.333333e+00
6 6.661338e-16
dtype: float64
"""
).replace("\n", "", 1),
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/window/test_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1150,3 +1150,25 @@ def test_rolling_descending_date_order_with_offset(window, frame_or_series):
idx = date_range(start="2020-01-03", end="2020-01-01", freq="-1d")
expected = frame_or_series([np.nan, 3, 2], index=idx)
tm.assert_equal(result, expected)


def test_rolling_var_floating_artifact_precision():
# GH 37051
s = Series([7, 5, 5, 5])
result = s.rolling(3).var()
expected = Series([np.nan, np.nan, 4 / 3, 0])
tm.assert_series_equal(result, expected, atol=1.0e-15, rtol=1.0e-15)


def test_rolling_std_small_values():
# GH 37051
s = Series(
[
0.00000054,
0.00000053,
0.00000054,
]
)
result = s.rolling(2).std()
expected = Series([np.nan, 7.071068e-9, 7.071068e-9])
tm.assert_series_equal(result, expected, atol=1.0e-15, rtol=1.0e-15)