Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate non-keyword arguments for drop_duplicates. #41500

Merged
merged 15 commits into from
May 25, 2021
Merged

Deprecate non-keyword arguments for drop_duplicates. #41500

merged 15 commits into from
May 25, 2021

Conversation

jmholzer
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented May 16, 2021

Hello @jmholzer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-24 20:25:12 UTC

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is off to a good start, thanks @jmholzer !



def test_drop_duplicates_pos_args_deprecation():
# test deprecation warning message for positional arguments GH#41485
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just GH#41485 should be fine here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the latest commit.



def test_drop_duplicates_pos_args_deprecation():
# test deprecation warning message for positional arguments GH#41485
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in latest commit.

pandas/core/series.py Outdated Show resolved Hide resolved
# test deprecation warning message for positional arguments GH#41485
df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]})
msg = (
r"Starting with Pandas version 2\.0 all arguments of drop_duplicates except for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is likely too long, did you run the linting checks before submitting? If you enable pre-commit (pre-commit install) they'll be run for you, else you can run them manually with pre-commit run --files <any file you've modified>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it was too long. I changed it in my latest commit. I was using git outside of my development environment for convenience. In my latest commits, I made sure to have pre-commit enabled, thanks for the hint.

@@ -647,6 +647,7 @@ Deprecations
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` and :meth:`Series.drop_duplicates` (:issue:`41485`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps mention that subset is allowed (e.g. "except for subset"), other than that, if the tests all pass, this looks good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks, I made that change and committed again.

I have checked that all tests on the methods (not just mine) still pass without issue also.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@simonjayhawkins simonjayhawkins added the Deprecate Functionality to remove in pandas label May 17, 2021
@@ -647,6 +647,7 @@ Deprecations
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``) and :meth:`Series.drop_duplicates` (:issue:`41485`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should do the same for Index at the same time for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will get error: Signature of "drop_duplicates" incompatible with supertype "IndexOpsMixin" [override] otherwise

Copy link
Contributor Author

@jmholzer jmholzer May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the warning decorator to Index.drop_duplicates + test in a commit on this branch / pull request?

cc @MarcoGorelli

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be great, thanks @jmholzer !

Copy link
Contributor Author

@jmholzer jmholzer May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made these changes in the commit 064268f.

However, it looks like a lot of commits that I pulled from the master branch ended up in this pull request also, I don't have enough knowledge of git to understand why this happened (I used rebase, but didn't think the commits from master would show up in the PR). Is this a problem? If so, can I fix it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you'll need to rebase (see https://youtu.be/hv8dhOEzQcM), currently this is showing lots of unrelated changes. Something like

git fetch upstream
git rebase -i upstream/master

and then in the interactive window choose which commits to keep/drop/fixup/edit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance, I appreciate it.

I did the following to fix things:

  1. Reverted the changes on my local branch to before the rebase.
  2. Cherry-picked my commit with the changes for Index from the remote.
  3. Pulled remote master changes to my local master branch.
  4. Merged my local feature branch with my local master branch.
  5. Force-pushed my local feature branch to the remote.

That has gotten rid of the spurious commits. I tried using a rebase instead of a merge in step 3, as you suggested, but I couldn't do it cleanly.

Are steps 3-4 acceptable for working on pandas, or is a rebase generally preferred to a merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's fine, commits will get squashed anyway

@jmholzer
Copy link
Contributor Author

@MarcoGorelli I made the changes to drop_duplicates that we discussed in the comments of #41551, they are in the latest commit.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks - couple of minor things, then this looks good to me

Comment on lines 314 to 317
msg = (
"In a future version of pandas all arguments of "
"Index.drop_duplicates will be keyword-only"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message shows Index - to get it to show MultiIndex, you'll need to define drop_duplicates in the MultiIndex class and then call super().drop_duplicates inside it - see interpolate for an example of this

Copy link
Contributor Author

@jmholzer jmholzer May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this change in 2cb482f.

I will have to do the same for #41551, as the test for MultiIndex currently has the same problem.

@@ -1738,3 +1738,20 @@ def test_construct_from_memoryview(klass, extra_kwargs):
result = klass(memoryview(np.arange(2000, 2005)), **extra_kwargs)
expected = klass(range(2000, 2005), **extra_kwargs)
tm.assert_index_equal(result, expected)


def test_drop_duplicates_pos_args_deprecation():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also put the issue number here?

Copy link
Contributor Author

@jmholzer jmholzer May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this change in 2cb482f.

@jmholzer
Copy link
Contributor Author

@MarcoGorelli regarding the warning message generation for MultiIndex, I think we need to remove the @final decorator from Index.drop_duplicates, is this acceptable?

https://docs.python.org/3.8/library/typing.html#typing.final

@simonjayhawkins
Copy link
Member

@MarcoGorelli regarding the warning message generation for MultiIndex, I think we need to remove the @final decorator from Index.drop_duplicates, is this acceptable?

https://docs.python.org/3.8/library/typing.html#typing.final

sure. we tend to use @Final to identify methods that aren't overloaded rather than methods that shouldn't be overloaded.

@MarcoGorelli MarcoGorelli self-requested a review May 23, 2021 17:42
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there 💪

Comment on lines 3799 to 3801
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
def drop_duplicates(self: _IndexT, keep: str_t | bool = "first") -> _IndexT:
return super(Index, self).drop_duplicates(keep=keep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to annotate self as _IndexT? If think you should be able to type the return type as MultiIndex and leave self untyped, see MultiIndex.dropna

Also, does super().drop_duplicates(keep=keep) not work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, str should be fine in this fine in this file, I think it's only used in pandas/core/indexes/base.py because there there's str = CachedAccessor("str", StringMethods) (note to self: there should probably be a pre-commit rule for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all the changes suggested in both comments, they are in fbf70a2.

Comment on lines 476 to 491
def test_drop_duplicates_pos_args_deprecation():
# GH#41485
df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]})

msg = (
"In a future version of pandas all arguments of "
"DataFrame.drop_duplicates except for the argument 'subset' "
"will be keyword-only"
)

with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.drop_duplicates(["b", "c"], "last")

expected = DataFrame({"a": [1, 2], "b": [1, 3], "c": [1, 3]}, index=[1, 2])

tm.assert_frame_equal(expected, result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove all the extra newlines here, this test could read as a single paragraph

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change to all my tests in fbf70a2, they now all read as one paragraph.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks

@MarcoGorelli MarcoGorelli merged commit 5d474cc into pandas-dev:master May 25, 2021
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
* ENH: Deprecate non-keyword arguments for drop_duplicates.

* leave newline

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* remove redundant line

* ENH: Deprecate non-keyword arguments for drop_duplicates.

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
* ENH: Deprecate non-keyword arguments for drop_duplicates.

* leave newline

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* ENH: Deprecate non-keyword arguments for drop_duplicates.

* remove redundant line

* ENH: Deprecate non-keyword arguments for drop_duplicates.

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants