Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* #18723

Merged
merged 3 commits into from
Feb 1, 2018

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Dec 11, 2017

This PR removes pd.rolling_*, pd.expanding_* pd.ewm* and the related pd.stats subpackage from the code base.

I have some doubts about some stuff:

  • there are modules doc.plots.stats.* that I'm not sure of if they should be changed or removed. Currently they're not touched.
  • In asv_bench.benchmarks.gil.py there are calls to pd.rolling_* etc. I've changed this to be method-based.
  • pandas.tests.test_windows.py I felt was a bit complex to clean up.

I would appreciate inputs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are removing a lot of tests that should remain

@@ -216,7 +216,9 @@ Removal of prior version deprecations/changes
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
- ``Categorical.from_array`` has been removed (:issue:`13854`)
- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601 & :issue:18668)
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:`18601`)
- The top-level functions ``pd.rolling_*``, ``pd.expanding_*`` and ``pd.ewm*`` have been removed (deprecated since v0.18).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t remove any issue numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this and some other stuff that’s unrelated to the pd.rolling stuff seems to be a rebase issue. I’ll fix it tomorrow.

If you want, you can limit the review to only the latest commit, the other commits are because of the rebase issue.

@@ -331,5 +333,5 @@ Categorical
Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t change other things

12.952, np.nan, np.nan])

with catch_warnings(record=True):
rs = mom.rolling_window(vals, 5, 'boxcar', center=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t remove tests just because we are catching the warning

you may need to rewrite them to use the new syntax
unless they duplicate current tests

@jreback jreback added the Deprecate Functionality to remove in pandas label Dec 11, 2017
@codecov
Copy link

codecov bot commented Dec 11, 2017

Codecov Report

Merging #18723 into master will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18723      +/-   ##
==========================================
+ Coverage   91.61%   91.67%   +0.06%     
==========================================
  Files         153      151       -2     
  Lines       51339    51120     -219     
==========================================
- Hits        47034    46865     -169     
+ Misses       4305     4255      -50
Flag Coverage Δ
#multiple 89.53% <ø> (+0.07%) ⬆️
#single 40.69% <ø> (-0.17%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/category.py 97.2% <0%> (-0.31%) ⬇️
pandas/core/window.py 96.08% <0%> (-0.24%) ⬇️
pandas/util/testing.py 82.34% <0%> (-0.2%) ⬇️
pandas/io/formats/format.py 96.03% <0%> (-0.15%) ⬇️
pandas/core/dtypes/dtypes.py 95.14% <0%> (-0.14%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/timedeltas.py 91.21% <0%> (-0.06%) ⬇️
pandas/core/indexes/numeric.py 97.33% <0%> (-0.04%) ⬇️
pandas/core/indexes/period.py 92.9% <0%> (-0.04%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa4aa9...e311f50. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 11, 2017

Codecov Report

Merging #18723 into master will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18723      +/-   ##
==========================================
+ Coverage   91.62%   91.69%   +0.07%     
==========================================
  Files         150      148       -2     
  Lines       48728    48542     -186     
==========================================
- Hits        44645    44513     -132     
+ Misses       4083     4029      -54
Flag Coverage Δ
#multiple 90.06% <ø> (+0.07%) ⬆️
#single 41.71% <ø> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca4ae4f...1443818. Read the comment docs.

@topper-123
Copy link
Contributor Author

The rebase issue has been solved now. I'll look at your comments now,. If you've got additional comments, I'd be thakful, and I'll look at them

@pep8speaks
Copy link

pep8speaks commented Dec 14, 2017

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 01, 2018 at 06:53 Hours UTC

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 14, 2017

Hi @jreback. An issue is that dask apparently still uses pd.rolling_*. This means that pandas/tests/test_downstream.py currently fails. Should this PR be put on hold, or would notifying the dask guys about this removal be enough? This stuff has been deprecated since 0.18, so for quite a long time.

This PR doesn't touch too many files, so it would be ok to wait a bit, but given the simplification to the pandas API that this PR offers, it should IMO be included when 0.22 releases.

I've updated the PR, so be'd grateful for a review, so it'll be ready for an eventual pull.

@jreback
Copy link
Contributor

jreback commented Dec 14, 2017

@TomAugspurger
Copy link
Contributor

The dask versions are deprecated, just like pandas. I believe the best solution would be to

  1. Use this PR to update all the test to the new syntax
  2. Update dask to remove the dd.rolling, etc. methods (@topper-123 could I convince you to work on that? :) It'd be nice to have you as a contributor if you haven't already.)
  3. Remove them from pandas once dask has a release

Depending on the timing of pandas 0.22, dask may not have a release out.

@topper-123
Copy link
Contributor Author

@TomAugspurger , the test_dask function doesn't explicitly call pd.rolling_*, so changes must be made in the dask code base, correct?

I've never used dask, and have a feeling I'd have to learn some intricancies of that lib to make such a PR, and I could easily go down a rabbit's hole that I never intended to enter... Could someone familiar with dask not take it upon themselves to do those changes? I can sit on this PR until dask is updated, and rebase when this is ready to merge.

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Dec 14, 2017
@TomAugspurger TomAugspurger mentioned this pull request Dec 14, 2017
3 tasks
@TomAugspurger
Copy link
Contributor

@topper-123 dask was importing them at the top level if dask.dataframe, which caused the failure. Next time dask wants to mirror a pandas deprecation, we'll want to put it in a try / except block.

dask/dask#2995

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 15, 2017

@TomAugspurger , thanks for picking up on this, I will wait with my PR a while, until dask is updated.

My PR lowers the number of top-level functions of pandas by some 30, so it would IMO be very beneficial to pandas' users to have this merged by 0.22, as it would make pandas look much more focused. Could dask have a release before pandas 0.22, eventually just a minor release?

@gfyoung gfyoung added the Clean label Dec 16, 2017
jcrist pushed a commit to dask/dask that referenced this pull request Dec 21, 2017
* Remove rolling

* Remove deprecated rolling methods

xref pandas-dev/pandas#18723

* PEP8

* Added reference
@topper-123
Copy link
Contributor Author

topper-123 commented Jan 10, 2018

dask 0.16.1 has now been released, and all tests now pass locally with dask version 0.16.1. The travis failure has to do with IO/S3 which I didn't touch. That's the only failure in travis.

So, I think everything is ok with my PR. Is there anything I can do to make this pass that travis error, or is this travis failure unrelated to my PR?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about one of the tests

assert result2.dtype == np.float_

def _check_ew_structures(self, func, name):
def _check_ew(self, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this function consisted of _check_ew_ndarray and _check_ew_structures subfunctions to test both interfaces. You removed _check_ew_ndarray and only kept the content of _check_ew_structures as this is testing the new interface.
But, as far as I can see, the actual testing was done in _check_ew_ndarray (_check_ew_structures is only testing the output class type). Shouldn't this testing be kept? (changed to the new interface)

(might good be I am missing something, or that it is covered in other tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right: So rather than test this function call, test the ewm method call on a series. I'll look into it tomorrow.

@topper-123
Copy link
Contributor Author

@jorisvandenbossche , I've now replicated the code from _check_ew_ndarray into check_ew. Could you review it?

@jorisvandenbossche
Copy link
Member

@topper-123 that specific one looks good now I think.
Did you double check the other tests to see there were not similar problems?

@topper-123
Copy link
Contributor Author

I've added test_expanding_func (parametrized, was called test_parametrized_max previously) and test_expanding_apply. Everything is now included AFAICT.

@topper-123 topper-123 force-pushed the remove_pd_stats branch 2 times, most recently from d30c033 to 1eca246 Compare January 17, 2018 06:34
@topper-123
Copy link
Contributor Author

The tests have been running for 9 hours now and the Travis-CI tests have not yet finished. Same thing happened yesterday where after 20 hours one travis test-suite failed without console output (my guess is it timed out). Is anything wrong with the travis suite?

Everything passes locally and the errors that travis reported two days ago have been fixed (but the fixes not confirmed by travis, because I can't get travis to go through them...).

What should I do? Could a core dev run the travis tests locally and confirm they're ok, and merge them?

@TomAugspurger
Copy link
Contributor

TravisCI had an outage yesterday, so all the builds were canceled and rescheduled. Looks like yours is 4th on deck at the moment: https://travis-ci.org/pandas-dev/pandas/pull_requests

@topper-123
Copy link
Contributor Author

Ok, thanks. I'll wait a bit more then.

@topper-123
Copy link
Contributor Author

All green now🤓

@jreback
Copy link
Contributor

jreback commented Jan 18, 2018

when @jorisvandenbossche approves can merge.

@topper-123
Copy link
Contributor Author

@jorisvandenbossche, I'd really appreciate if you could do this.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @topper-123, one more question
(it's a very complex file / diff (not your fault!) to review)

result = get_result(arr, 50, min_periods=30)
tm.assert_almost_equal(result[-1], static_comp(arr[10:-10]))

# min_periods is working correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move parts of _check_ndarray like this part on checking min_periods to _check_moments_func ?

From a superficial look, it seems that _check_ndarray did more than _check_structures (eg in _check_structures I don't see any assertion of NaNs when min_periods is specified like here)

I also don't know if test_stable and test_window make sense for the new interface

@topper-123
Copy link
Contributor Author

Changes suggested by @jorisvandenbossche implemented and all green.

r = obj.rolling(window=window, min_periods=min_periods,
center=center)
return getattr(r, name)(**kwargs)
self._check_series(name, static_comp=static_comp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the point of adding _check_series, it seems that you are already checking the series & frame results (line 1344), yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there IS a difference, IOW you are checking something different, then certainly don't make a new function, just in-line it here.

@topper-123
Copy link
Contributor Author

topper-123 commented Jan 31, 2018

Hi @jreback. I uploaded a new PR 12 days ago.

There were a few issues that were checked in _check_ndarray but were not in _check_moments previously:

  • tests for whether Nans were excluded correctly
  • tests for whether center worked correctly
  • tests for what happens when the window is larger than the series

All of the above are inlined in _check_moments now (see 37abd25).

I've rebased a moments ago, without making code changes otherwise, so everything should turn up green & rebased when the tests are finished.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. just that 1 question.

@@ -863,72 +848,55 @@ def test_centered_axis_validation(self):
.rolling(window=3, center=True, axis=2).mean())

def test_rolling_sum(self):
self._check_moment_func(mom.rolling_sum, np.nansum, name='sum',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, shouldn't this be nansum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. fixed.

@topper-123
Copy link
Contributor Author

all green.

@jorisvandenbossche jorisvandenbossche merged commit 3597de0 into pandas-dev:master Feb 1, 2018
@jorisvandenbossche
Copy link
Member

@topper-123 Thanks a lot for this!

@topper-123 topper-123 deleted the remove_pd_stats branch February 1, 2018 11:33
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
…18723)

* remove pd.running_*, pd.expanding_* and pd.ewm* and related code

* added test_expanding_func and test_expanding_apply

* recreate _check_ndarray inline in _check_moment_func
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants