Deprecate non-keyword arguments for drop_duplicates. #41500

jmholzer · 2021-05-16T01:11:37Z

xref Deprecate non-keyword arguments for methods with inplace #41485
tests added / passed
Ensure all linting tests pass
whatsnew entry

pep8speaks · 2021-05-16T01:11:41Z

Hello @jmholzer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-24 20:25:12 UTC

MarcoGorelli

This is off to a good start, thanks @jmholzer !

MarcoGorelli · 2021-05-16T09:05:56Z

pandas/tests/frame/methods/test_drop_duplicates.py

+
+
+def test_drop_duplicates_pos_args_deprecation():
+    # test deprecation warning message for positional arguments GH#41485


Just GH#41485 should be fine here

Changed in the latest commit.

MarcoGorelli · 2021-05-16T09:06:05Z

pandas/tests/series/methods/test_drop_duplicates.py

+
+
+def test_drop_duplicates_pos_args_deprecation():
+    # test deprecation warning message for positional arguments GH#41485


Changed in latest commit.

pandas/core/series.py

MarcoGorelli · 2021-05-16T09:08:07Z

pandas/tests/frame/methods/test_drop_duplicates.py

+    # test deprecation warning message for positional arguments GH#41485
+    df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]})
+    msg = (
+        r"Starting with Pandas version 2\.0 all arguments of drop_duplicates except for "


This line is likely too long, did you run the linting checks before submitting? If you enable pre-commit (pre-commit install) they'll be run for you, else you can run them manually with pre-commit run --files <any file you've modified>

You're right, it was too long. I changed it in my latest commit. I was using git outside of my development environment for convenience. In my latest commits, I made sure to have pre-commit enabled, thanks for the hint.

MarcoGorelli · 2021-05-16T15:02:47Z

doc/source/whatsnew/v1.3.0.rst

@@ -647,6 +647,7 @@ Deprecations
 - Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
 - Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
 - Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
+- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` and :meth:`Series.drop_duplicates` (:issue:`41485`)


perhaps mention that subset is allowed (e.g. "except for subset"), other than that, if the tests all pass, this looks good to me

Alright, thanks, I made that change and committed again.

I have checked that all tests on the methods (not just mine) still pass without issue also.

MarcoGorelli

Looks good to me, thanks!

simonjayhawkins · 2021-05-17T11:52:14Z

doc/source/whatsnew/v1.3.0.rst

@@ -647,6 +647,7 @@ Deprecations
 - Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
 - Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
 - Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
+- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``) and :meth:`Series.drop_duplicates` (:issue:`41485`)


maybe we should do the same for Index at the same time for consistency.

we will get error: Signature of "drop_duplicates" incompatible with supertype "IndexOpsMixin" [override] otherwise

Should I add the warning decorator to Index.drop_duplicates + test in a commit on this branch / pull request?

cc @MarcoGorelli

Yes, that would be great, thanks @jmholzer !

I made these changes in the commit 064268f.

However, it looks like a lot of commits that I pulled from the master branch ended up in this pull request also, I don't have enough knowledge of git to understand why this happened (I used rebase, but didn't think the commits from master would show up in the PR). Is this a problem? If so, can I fix it?

Yeah you'll need to rebase (see https://youtu.be/hv8dhOEzQcM), currently this is showing lots of unrelated changes. Something like

git fetch upstream git rebase -i upstream/master

and then in the interactive window choose which commits to keep/drop/fixup/edit

Thanks for the guidance, I appreciate it.

I did the following to fix things:

Reverted the changes on my local branch to before the rebase.

Cherry-picked my commit with the changes for Index from the remote.

Pulled remote master changes to my local master branch.

Merged my local feature branch with my local master branch.

Force-pushed my local feature branch to the remote.

That has gotten rid of the spurious commits. I tried using a rebase instead of a merge in step 3, as you suggested, but I couldn't do it cleanly.

Are steps 3-4 acceptable for working on pandas, or is a rebase generally preferred to a merge?

yeah that's fine, commits will get squashed anyway

jmholzer · 2021-05-20T18:41:16Z

@MarcoGorelli I made the changes to drop_duplicates that we discussed in the comments of #41551, they are in the latest commit.

MarcoGorelli

Awesome, thanks - couple of minor things, then this looks good to me

MarcoGorelli · 2021-05-22T19:39:18Z

pandas/tests/indexes/multi/test_duplicates.py

+    msg = (
+        "In a future version of pandas all arguments of "
+        "Index.drop_duplicates will be keyword-only"
+    )


The error message shows Index - to get it to show MultiIndex, you'll need to define drop_duplicates in the MultiIndex class and then call super().drop_duplicates inside it - see interpolate for an example of this

Added this change in 2cb482f.

I will have to do the same for #41551, as the test for MultiIndex currently has the same problem.

MarcoGorelli · 2021-05-22T19:39:52Z

pandas/tests/indexes/test_base.py

@@ -1738,3 +1738,20 @@ def test_construct_from_memoryview(klass, extra_kwargs):
    result = klass(memoryview(np.arange(2000, 2005)), **extra_kwargs)
    expected = klass(range(2000, 2005), **extra_kwargs)
    tm.assert_index_equal(result, expected)
+
+
+def test_drop_duplicates_pos_args_deprecation():


can you also put the issue number here?

Added this change in 2cb482f.

jmholzer · 2021-05-23T12:52:00Z

@MarcoGorelli regarding the warning message generation for MultiIndex, I think we need to remove the @final decorator from Index.drop_duplicates, is this acceptable?

https://docs.python.org/3.8/library/typing.html#typing.final

simonjayhawkins · 2021-05-23T13:03:50Z

@MarcoGorelli regarding the warning message generation for MultiIndex, I think we need to remove the @final decorator from Index.drop_duplicates, is this acceptable?

https://docs.python.org/3.8/library/typing.html#typing.final

sure. we tend to use @Final to identify methods that aren't overloaded rather than methods that shouldn't be overloaded.

MarcoGorelli

Almost there 💪

MarcoGorelli · 2021-05-24T17:14:08Z

pandas/core/indexes/multi.py

+    @deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
+    def drop_duplicates(self: _IndexT, keep: str_t | bool = "first") -> _IndexT:
+        return super(Index, self).drop_duplicates(keep=keep)


is it necessary to annotate self as _IndexT? If think you should be able to type the return type as MultiIndex and leave self untyped, see MultiIndex.dropna

Also, does super().drop_duplicates(keep=keep) not work?

Finally, str should be fine in this fine in this file, I think it's only used in pandas/core/indexes/base.py because there there's str = CachedAccessor("str", StringMethods) (note to self: there should probably be a pre-commit rule for this)

I made all the changes suggested in both comments, they are in fbf70a2.

MarcoGorelli · 2021-05-24T17:18:06Z

pandas/tests/frame/methods/test_drop_duplicates.py

+def test_drop_duplicates_pos_args_deprecation():
+    # GH#41485
+    df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]})
+
+    msg = (
+        "In a future version of pandas all arguments of "
+        "DataFrame.drop_duplicates except for the argument 'subset' "
+        "will be keyword-only"
+    )
+
+    with tm.assert_produces_warning(FutureWarning, match=msg):
+        result = df.drop_duplicates(["b", "c"], "last")
+
+    expected = DataFrame({"a": [1, 2], "b": [1, 3], "c": [1, 3]}, index=[1, 2])
+
+    tm.assert_frame_equal(expected, result)


I think we can remove all the extra newlines here, this test could read as a single paragraph

I made this change to all my tests in fbf70a2, they now all read as one paragraph.

…ithub.com/jmholzer/pandas into deprecate-nonkeyword-args-drop_duplicates

MarcoGorelli

Awesome thanks

* ENH: Deprecate non-keyword arguments for drop_duplicates. * leave newline * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * remove redundant line * ENH: Deprecate non-keyword arguments for drop_duplicates. Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>

jmholzer added 2 commits May 16, 2021 02:32

ENH: Deprecate non-keyword arguments for drop_duplicates.

ebd40aa

leave newline

8cb7645

jmholzer mentioned this pull request May 16, 2021

Deprecate non-keyword arguments for methods with inplace #41485

Closed

31 tasks

MarcoGorelli self-requested a review May 16, 2021 09:02

MarcoGorelli requested changes May 16, 2021

View reviewed changes

jmholzer added 2 commits May 16, 2021 14:28

ENH: Deprecate non-keyword arguments for drop_duplicates.

fa6574c

ENH: Deprecate non-keyword arguments for drop_duplicates.

d7c341a

MarcoGorelli reviewed May 16, 2021

View reviewed changes

ENH: Deprecate non-keyword arguments for drop_duplicates.

19aa589

MarcoGorelli approved these changes May 16, 2021

View reviewed changes

simonjayhawkins added the Deprecate Functionality to remove in pandas label May 17, 2021

simonjayhawkins reviewed May 17, 2021

View reviewed changes

simonjayhawkins added this to the 1.3 milestone May 17, 2021

jmholzer mentioned this pull request May 19, 2021

ENH: Deprecate non-keyword arguments for Index.set_names. #41551

Merged

4 tasks

merge

448e8a1

jmholzer added 3 commits May 22, 2021 14:49

ENH: Deprecate non-keyword arguments for drop_duplicates.

0d54ca7

ENH: Deprecate non-keyword arguments for drop_duplicates.

463c37a

merge

c0d3d34

MarcoGorelli requested changes May 22, 2021

View reviewed changes

ENH: Deprecate non-keyword arguments for drop_duplicates.

2cb482f

ENH: Deprecate non-keyword arguments for drop_duplicates.

09fe413

MarcoGorelli self-requested a review May 23, 2021 17:42

MarcoGorelli added 2 commits May 23, 2021 18:45

remove redundant line

03d0330

Merge remote-tracking branch 'upstream/master' into pr/jmholzer/41500

be8393d

MarcoGorelli requested changes May 24, 2021

View reviewed changes

ENH: Deprecate non-keyword arguments for drop_duplicates.

fbf70a2

Merge branch 'deprecate-nonkeyword-args-drop_duplicates' of https://g…

937d9e2

…ithub.com/jmholzer/pandas into deprecate-nonkeyword-args-drop_duplicates

jmholzer requested a review from MarcoGorelli May 25, 2021 17:32

MarcoGorelli approved these changes May 25, 2021

View reviewed changes

MarcoGorelli merged commit 5d474cc into pandas-dev:master May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate non-keyword arguments for drop_duplicates. #41500

Deprecate non-keyword arguments for drop_duplicates. #41500

jmholzer commented May 16, 2021

pep8speaks commented May 16, 2021 •

edited

Loading

MarcoGorelli left a comment

MarcoGorelli May 16, 2021

jmholzer May 16, 2021

MarcoGorelli May 16, 2021

jmholzer May 16, 2021

MarcoGorelli May 16, 2021

jmholzer May 16, 2021

MarcoGorelli May 16, 2021

jmholzer May 16, 2021

MarcoGorelli left a comment

simonjayhawkins May 17, 2021

simonjayhawkins May 17, 2021

jmholzer May 20, 2021 •

edited

Loading

MarcoGorelli May 21, 2021

jmholzer May 21, 2021 •

edited

Loading

MarcoGorelli May 22, 2021

jmholzer May 22, 2021

MarcoGorelli May 22, 2021

jmholzer commented May 20, 2021

MarcoGorelli left a comment

MarcoGorelli May 22, 2021

jmholzer May 23, 2021 •

edited

Loading

MarcoGorelli May 22, 2021

jmholzer May 23, 2021 •

edited

Loading

jmholzer commented May 23, 2021

simonjayhawkins commented May 23, 2021

MarcoGorelli left a comment

MarcoGorelli May 24, 2021

MarcoGorelli May 24, 2021

jmholzer May 24, 2021

MarcoGorelli May 24, 2021

jmholzer May 24, 2021

MarcoGorelli left a comment



		def test_drop_duplicates_pos_args_deprecation():
		# test deprecation warning message for positional arguments GH#41485

Deprecate non-keyword arguments for drop_duplicates. #41500

Deprecate non-keyword arguments for drop_duplicates. #41500

Conversation

jmholzer commented May 16, 2021

pep8speaks commented May 16, 2021 • edited Loading

Comment last updated at 2021-05-24 20:25:12 UTC

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer May 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer May 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer commented May 20, 2021

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer May 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer May 23, 2021 • edited Loading

Choose a reason for hiding this comment

jmholzer commented May 23, 2021

simonjayhawkins commented May 23, 2021

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

pep8speaks commented May 16, 2021 •

edited

Loading

jmholzer May 20, 2021 •

edited

Loading

jmholzer May 21, 2021 •

edited

Loading

jmholzer May 23, 2021 •

edited

Loading

jmholzer May 23, 2021 •

edited

Loading