rolling skew and rolling kurt has wrong answer under all eq values #18044

zartbot · 2017-10-31T02:31:55Z

Code Sample, a copy-pastable example if possible

pd.Series([1.1]*15).rolling(10).skew()
pd.Series([1.1]*15).rolling(10).kurt()

Problem description

rolling skew() wrong result:

pd.Series([1.1]*15).rolling(10).skew()

0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 -1.056765e+08
10 -1.056765e+08
11 -1.056765e+08
12 -1.056765e+08
13 -1.056765e+08
14 -1.056765e+08
dtype: float64

rolling kurt() wrong result:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 1.201335e+16
10 1.201335e+16
11 1.201335e+16
12 1.201335e+16
13 1.201335e+16
14 1.201335e+16
dtype: float64

But try pd.Series([1.1]*15).tail(10).skew() and pd.Series([1.1]*15).tail(10).kurt() could get the right result.
and also a reference workaround by using apply to call scipy:

pd.Series([1.1]*15).rolling(10).apply(lambda x: scipy.stats.skew(x))
pd.Series([1.1]*15).rolling(10).apply(lambda x: scipy.stats.kurt(x))

seems it has bug in rolling_skew/rolling_kurt

Expected Output

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-93-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.24
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.5.2
feather: None
matplotlib: 2.0.0
openpyxl: 2.2.0-b1
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: None
html5lib: 1.0b10
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-10-31T04:51:25Z

@ogotaiking : Thanks for reporting this! For reference, do you mind posting the expected results too?

selik · 2017-10-31T20:29:32Z

@gfyoung The skew of a uniform distribution is zero. The kurtosis of the uniform distribution could be -6/5, depending on how you define it. While @ogotaiking says the kurtosis is calculated correctly in the non-rolling case, it's returning zero for me, so that might not be what's desired.

zartbot · 2017-11-01T03:02:03Z

The issue would be variance numerically unstable in "calc_skew" , the answer will divide by B , but the code only check B<=0, during rolling, add_skew/remove_skew may cause B in range(0,1e-9), so after that uniform distribution will get such large number in skew/kurt.

#6817 from jaimefrio/rolling_var-welford resolve the roll_var stability , but seems it will also import to skew/kurt.

# Rolling skewness

cdef inline double calc_skew(int64_t minp, int64_t nobs, double x, double xx,
                             double xxx) nogil:
    cdef double result, dnobs
    cdef double A, B, C, R

    if nobs >= minp:
        dnobs = <double>nobs
        A = x / dnobs
        B = xx / dnobs - A * A
        C = xxx / dnobs - A * A * A - 3 * A * B
        if B <= 0 or nobs < 3: <-- B's precision has some issue cause uniform distribution not be ZERO
            result = NaN
        else:
            R = sqrt(B)
            result = ((sqrt(dnobs * (dnobs - 1.)) * C) /
                      ((dnobs - 2) * R * R * R))  <-- here will divde 1e-N
    else:
        result = NaN

    return result

zartbot · 2017-11-01T06:50:15Z

in pandas/_libs/window.pyx
RCA(Root Cause Analysis):
for uniform distribution, all values are eq.
function roll_skew will continues call add_skew / remove_skew.

xx  += val * val
xx -= prev* prev

but during rolling, it will cause numerically unstable in xx,
previously suggest is a bad fix

gfyoung · 2017-11-01T08:30:23Z

@ogotaiking : Cool! Feel free to propose changes in a PR and add a test or two to confirm. We can look at it in more detail there.

zartbot · 2017-11-01T15:44:18Z

@gfyoung

just did more test, the previous fix does not work in some particular numbers, and check core/nanops.py,

during calculate m2 and m3 , it has the following function to validate the float issue.

_zero_out_fperr(arg):
    if isinstance(arg, np.ndarray):
        with np.errstate(invalid='ignore'):
            return np.where(np.abs(arg) < 1e-14, 0, arg)
    else:
        return arg.dtype.type(0) if np.abs(arg) < 1e-14 else arg

but in window.pyx it's a cython code , i am not familiar with it to modify or do such validation.

I just did some simple change

diff --git a/pandas/_libs/window.pyx b/pandas/_libs/window.pyx
index b6bd6f9..64a1a40 100644
--- a/pandas/_libs/window.pyx
+++ b/pandas/_libs/window.pyx
@@ -788,7 +788,7 @@ cdef inline double calc_skew(int64_t minp, int64_t nobs, double x, double xx,
         A = x / dnobs
         B = xx / dnobs - A * A
         C = xxx / dnobs - A * A * A - 3 * A * B
-        if B <= 0 or nobs < 3:
+        if B <= 1e-14 or nobs < 3:
             result = NaN
         else:
             R = sqrt(B)
@@ -915,7 +915,7 @@ cdef inline double calc_kurt(int64_t minp, int64_t nobs, double x, double xx,
         R = R * A
         D = xxxx / dnobs - R - 6 * B * A * A - 4 * C * A
 
-        if B == 0 or nobs < 4:
+        if B <= 1e-14 or nobs < 4:
             result = NaN
         else:
             K = (dnobs * dnobs - 1.) * D / (B * B) - 3 * ((dnobs - 1.) ** 2)

and I did more test on random numbers with uniform distribution

c=0.0
for i in range(0,10000):
    a = pd.Series([np.random.rand(1)]*30)
    b=max(a.rolling(10).skew().fillna(0))
    if b > c :
        c = b

the result looks good , after try 10000 random numbers , c still eq zero.
and also check random distribution skew:

tc=0.0
for i in range(0,10000):
    test=pd.Series(np.random.rand(500))
    aa = test.rolling(10).skew().fillna(0)
    bb = test.rolling(10).apply(lambda x: pd.Series(x).skew()).fillna(0)
    cc = max(aa - bb)
    if cc > tc:
        tc = cc
tc

the max diff between rolling_skew and nanops.nanskew is : 1.0452083643031074e-11

enricogandini · 2022-01-13T09:12:10Z

I encountered this bug again.
When applying the Pandas kurt and skew to a rolling window of a Series with all identical items (in this case, all items are zero),
the results are Series containing all NaN values. When applying the kurtosis and skew functions from scipy.stats, I get the expected results: Series with NaN values only in the first window_size - 1 elements.

import numpy as np
import pandas as pd
from scipy import stats

array_size = 20
zeros = pd.Series(np.zeros(array_size))

window_size = 5
rolling = zeros.rolling(window=window_size)

results_pandas = rolling.aggregate(['kurt', 'skew'])
assert results_pandas.isna().all().all()

results_scipy = rolling.aggregate([stats.kurtosis, stats.skew])
assert results_scipy.isna().sum().eq(window_size - 1).all()

System information:

I am using Python 3.8.9, Pandas 1.1.4, SciPy 1.4.1, and NumPy 1.20.2.

alex7088 · 2022-06-20T17:46:27Z

I have same proble. Rolling and tail skew() gives different result.

baileymeta · 2025-02-01T12:24:34Z

https://pythoner.space/questions/what-is-the-method-for-calculating-kurtosis-in-the-scipy-library

Answer complete

zartbot mentioned this issue Nov 1, 2017

BUG:fix rolling skew kurt floating issue #18065

Merged

4 tasks

TomAugspurger added this to the Next Major Release milestone Nov 1, 2017

TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 1, 2017

jreback modified the milestones: Next Major Release, 0.21.1 Nov 4, 2017

jreback added the Bug label Nov 4, 2017

jreback closed this as completed in #18065 Nov 8, 2017

jreback mentioned this issue Nov 22, 2017

BUG: rolling.corr() produces wrong result with equal values #18430

Closed

kgryte mentioned this issue Apr 9, 2020

Incremental skewness and other higher moments when provided equal values do not yield expected results stdlib-js/stdlib#287

Open

2 tasks

j7168908jx mentioned this issue Mar 23, 2024

BUG: Wrong kurtosis outcome due to inadequate fix to previous issues #57972

Open

3 tasks

Liam3851 mentioned this issue Aug 21, 2024

BUG: Wrong result of Kurtosis #59572

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling skew and rolling kurt has wrong answer under all eq values #18044

rolling skew and rolling kurt has wrong answer under all eq values #18044

zartbot commented Oct 31, 2017 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

gfyoung commented Oct 31, 2017 •

edited

Loading

selik commented Oct 31, 2017 •

edited

Loading

zartbot commented Nov 1, 2017

zartbot commented Nov 1, 2017 •

edited

Loading

gfyoung commented Nov 1, 2017

zartbot commented Nov 1, 2017 •

edited

Loading

enricogandini commented Jan 13, 2022

alex7088 commented Jun 20, 2022 •

edited

Loading

baileymeta commented Feb 1, 2025

rolling skew and rolling kurt has wrong answer under all eq values #18044

rolling skew and rolling kurt has wrong answer under all eq values #18044

Comments

zartbot commented Oct 31, 2017 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

gfyoung commented Oct 31, 2017 • edited Loading

selik commented Oct 31, 2017 • edited Loading

zartbot commented Nov 1, 2017

zartbot commented Nov 1, 2017 • edited Loading

gfyoung commented Nov 1, 2017

zartbot commented Nov 1, 2017 • edited Loading

enricogandini commented Jan 13, 2022

System information:

alex7088 commented Jun 20, 2022 • edited Loading

baileymeta commented Feb 1, 2025

zartbot commented Oct 31, 2017 •

edited

Loading

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

gfyoung commented Oct 31, 2017 •

edited

Loading

selik commented Oct 31, 2017 •

edited

Loading

zartbot commented Nov 1, 2017 •

edited

Loading

zartbot commented Nov 1, 2017 •

edited

Loading

alex7088 commented Jun 20, 2022 •

edited

Loading