Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.rolling_std() first value is nan #1884

Closed
erg opened this issue Sep 10, 2012 · 5 comments
Closed

pandas.rolling_std() first value is nan #1884

erg opened this issue Sep 10, 2012 · 5 comments

Comments

@erg
Copy link
Contributor

erg commented Sep 10, 2012

The window is 3, but we want a std at min_periods=1. The one-period standard deviation is trivially 0.

In [28]: pandas.rolling_std(np.array([1,2,3,4,5], dtype='double'), 3, min_periods=1)
Out[28]: array([        nan,  0.70710678,  1.        ,  1.        ,  1.        ])

The pathological case:

In [29]: pandas.rolling_std(np.array([1,2,3,4,5], dtype='double'), 1, min_periods=1)
Out[29]: array([ nan,  nan,  nan,  nan,  nan])

Maybe it's because pandas is taking the unbiased std for N-1 where N = 1, so it's dividing by zero?

@erg
Copy link
Contributor Author

erg commented Sep 10, 2012


In [32]: import bottleneck as bn

In [33]: bn.move_std(np.array([1,2,3], dtype='double'), 1)
Out[33]: array([ 0.,  0.,  0.])

Bottleneck has a parameter you can use to set the degrees of freedom. Maybe that's a feature worth implementing?

In [36]: ?bn.move_std
Type:       builtin_function_or_method
String Form:<built-in function move_std>
Docstring:
move_std(arr, int window, int axis=-1, int ddof=0)

Moving window standard deviation along the specified axis.

Unlike bn.nanstd, which uses a more rubust two-pass algorithm, move_std
uses a faster one-pass algorithm.

An example of a one-pass algorithm:

    >>> np.sqrt((arr*arr).mean() - arr.mean()**2)

An example of a two-pass algorithm:    

    >>> np.sqrt(((arr - arr.mean())**2).mean())

Note in the two-pass algorithm the mean must be found (first pass) before
the squared deviation (second pass) can be found.

Parameters
----------
arr : ndarray
    Input array.
window : int
    The number of elements in the moving window.
axis : int, optional
    The axis over which to perform the moving standard deviation. By
    default the moving standard deviation is taken over the last axis
    (axis=-1). An axis of None is not allowed.
ddof : int, optional
    Means Delta Degrees of Freedom. The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

Returns
-------
y : ndarray
    The moving standard deviation of the input array along the specified
    axis. The output has the same shape as the input. 

Examples
--------
>>> arr = np.array([1.0, 2.0, 3.0, 4.0])
>>> bn.move_std(arr, window=2)
array([ nan,  1.5,  2.5,  3.5])

@wesm
Copy link
Member

wesm commented Sep 12, 2012

Having configurable ddof would be nice. Leaving it for the next release though

@erg
Copy link
Contributor Author

erg commented Sep 12, 2012

I think the first value being nan is a bug. Maybe this part could get fixed for the release?

@wesm
Copy link
Member

wesm commented Sep 12, 2012

A fair point. I suppose the nobs == 1 case should always yield 0

@wesm wesm closed this as completed in 8743be5 Sep 13, 2012
@wesm
Copy link
Member

wesm commented Sep 13, 2012

Fixed both issues

yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 27, 2012
* commit 'v0.8.1-203-g67121af': (193 commits)
  BUG: DataFrame column formatting issue in length-truncated column close pandas-dev#1906
  BUG: override min/max in DatetimeIndex to function as expected close pandas-dev#1895
  BUG: DataFrame mixed-type arithmetic column-wise, fix DataFrame.diff upcasting->object bug close pandas-dev#1896
  BUG: treat nobs=1 >= min_periods case in rolling_std/variance as 0 trivially. close pandas-dev#1884
  TST: skip to_file test if URLError occurs on some systems
  VB: resolve test name conflict and update make script
  DOC: minor change to build script to help auto build process
  DOC: fixed extlinks in sphinx conf
  TST: oops import in wrong place
  TST: skip test_console_encode if sys.stdin.encoding is None
  TST: unit test for pandas-dev#1902 and default to csv.QUOTE_MINIMAL
  Make it possible to set quoting for to_csv
  ENH: clean up pandas-dev#1691 changes, rls note
  ENH: add more possible bool values to read_csv pandas-dev#1295
  BUG: fix rolling_max/min for small inputs and large windows. Add a check that the min_period <= window size. Fixes pandas-dev#1897.
  Mention Ubuntu for NeuroDebian repository
  BUG: don't clobber color keyword in Series.plot, close pandas-dev#1890
  DOC: add intersphinx mapping for python library, close pandas-dev#1556
  BUG: fix mixed-integer .ix indexing bugs. close#1799
  BUG: unicode sheet name in to_excel pandas-dev#1828
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants