ENH: Add ignore_index for df.drop_duplicates #30405

charlesdong1991 · 2019-12-22T15:51:01Z

xref API: add ignore_index keyword to .sort_* & .drop_duplicates #30114
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…uplicates

gfyoung · 2019-12-22T19:24:43Z

pandas/core/frame.py

@@ -4606,6 +4607,8 @@ def drop_duplicates(
            - False : Drop all duplicates.
        inplace : bool, default False
            Whether to drop duplicates in place or to return a copy.
+        ignore_index : bool, default False
+            If True, the resulting axis will be labeled 0, …, n - 1.


Suggested change

If True, the resulting axis will be labeled 0, …, n - 1.

If True, the resulting axis will be labeled 0, 1, …, n - 1.

thanks, changed!

pandas/core/frame.py

WillAyd · 2019-12-23T08:33:08Z

pandas/core/frame.py

            self._update_inplace(new_data)
        else:
+            if ignore_index:
+                idx = ibase.default_index(len(self[-duplicated]))


I think this block could be more succinctly written as:

result = self[~duplicated] if ignore_index: result = result.reset_index(drop=True) return result

Or something similar. FWIW I think the current method of evaluating self[~duplicated] twice can be costly for larger frames

ahh, good call! @WillAyd

will change! i think idx = ibase.default_index(sum(-duplicated)) should be also faster than using self[-duplicated]

pandas/core/frame.py

jreback · 2019-12-24T16:40:57Z

doc/source/whatsnew/v1.0.0.rst

@@ -474,7 +474,7 @@ Other API changes
  Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
 - When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
 - :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`).
-
+- Add ``ignore_index`` to :meth:`DataFrame.drop_duplicates` to reset index (:issue:`30114`)


reverse the ordering here, e.g. .drop_duplicates has gained the ignore_index keyword.

move to other enhancements

moved and rephrased!

pandas/core/frame.py

jreback · 2019-12-24T16:42:13Z

pandas/core/frame.py

@@ -4621,9 +4624,15 @@ def drop_duplicates(
        if inplace:
            (inds,) = (-duplicated)._ndarray_values.nonzero()
            new_data = self._data.take(inds)
+
+            if ignore_index:
+                new_data = new_data.reset_index(drop=True)


don't use reset_index, simply use what you did in the other PR, default_index. .reset_index() causes another copy here

yeah, changed!

jreback · 2019-12-24T16:42:51Z

pandas/core/frame.py

-            return self[-duplicated]
+            result = self[-duplicated]
+            if ignore_index:
+                return result.reset_index(drop=True)


here you have to copy before you update the index

i did not use copy, but .index assignment, and i think it is also correct.

pandas/tests/frame/test_duplicates.py

jreback · 2019-12-26T00:40:31Z

pandas/core/frame.py

+            result = self[-duplicated]
+
+            if ignore_index:
+                result.index = ibase.default_index(sum(-duplicated))


result.index = ibase.default_index(len(result))

jreback · 2019-12-26T00:44:05Z

pandas/tests/frame/test_duplicates.py

+    tm.assert_frame_equal(df, DataFrame(origin_dict))
+
+
+@pytest.mark.parametrize(


can you incorporate these in the above test as well.

can you also assert if inplace== True that the input is unchanged (copy it before and check equality)

incorporated, and added test

thanks @jreback

…uplicates

WillAyd

lgtm

jreback · 2019-12-27T16:33:15Z

thanks @charlesdong1991

…ndexing-1row-df * upstream/master: (333 commits) CI: troubleshoot Web_and_Docs failing (pandas-dev#30534) WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525) DEPR: camelCase in offsets, get_offset (pandas-dev#30340) PERF: implement scalar ops blockwise (pandas-dev#29853) DEPR: Remove Series.compress (pandas-dev#30514) ENH: Add numba engine for rolling apply (pandas-dev#30151) [ENH] Add to_markdown method (pandas-dev#30350) DEPR: Deprecate pandas.np module (pandas-dev#30386) ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405) BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491) CI: Fix GBQ Tests (pandas-dev#30478) Bug groupby quantile listlike q and int columns (pandas-dev#30485) ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402) TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398) BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335) Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502) BUG: preserve EA dtype in transpose (pandas-dev#30091) BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498) REF/TST: method-specific files for test_append (pandas-dev#30503) marked unused parameters (pandas-dev#30504) ...

charlesdong1991 added 5 commits December 3, 2018 17:43

remove \n from docstring

7e461a1

fix conflicts

1314059

Merge remote-tracking branch 'upstream/master'

8bcb313

Merge remote-tracking branch 'upstream/master' into fix_issue_30114_d…

3faf5cc

…uplicates

Add ignore_index for drop duplicates

bdda8ca

gfyoung added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 22, 2019

gfyoung reviewed Dec 22, 2019

View reviewed changes

pandas/core/frame.py Show resolved Hide resolved

add forgotten test and code change based on review

6e76e56

WillAyd requested changes Dec 23, 2019

View reviewed changes

code change on WA review

c12beb6

WillAyd reviewed Dec 23, 2019

View reviewed changes

pandas/core/frame.py Show resolved Hide resolved

keep consistency

1b6dc51

WillAyd reviewed Dec 23, 2019

View reviewed changes

pandas/core/frame.py Show resolved Hide resolved

jreback requested changes Dec 24, 2019

View reviewed changes

charlesdong1991 added 4 commits December 24, 2019 20:11

code change based on JR review

17dbcb0

add test

a173eea

restore wrong deleted code

4a37e8f

remove

79a49e1

jreback requested changes Dec 26, 2019

View reviewed changes

charlesdong1991 added 3 commits December 26, 2019 08:52

Merge remote-tracking branch 'upstream/master' into fix_issue_30114_d…

5fb6e54

…uplicates

code change based on JR review

a8552d8

simplify code

6eaff2e

WillAyd approved these changes Dec 26, 2019

View reviewed changes

jreback added this to the 1.0 milestone Dec 27, 2019

jreback approved these changes Dec 27, 2019

View reviewed changes

jreback merged commit 7025c59 into pandas-dev:master Dec 27, 2019

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Dec 29, 2019

ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405)

b35a5f4

charlesdong1991 mentioned this pull request Jan 3, 2020

CLN: Clean tests for *.sort_index, *.sort_values and df.drop_duplicates #30651

Merged

1 task

jreback mentioned this pull request Mar 11, 2020

Allow to select index in drop_duplicates and duplicated #9708

Closed

This was referenced Jun 22, 2020

ENH: add ignore_index argument to DataFrame.explode / Series.explode #34932

Closed

ENH: add ignore_index option in DataFrame.explode #34933

Merged

mroeschke mentioned this pull request Aug 29, 2022

ENH: Add ignore_index to Series.drop_duplicates #48304

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add ignore_index for df.drop_duplicates #30405

ENH: Add ignore_index for df.drop_duplicates #30405

charlesdong1991 commented Dec 22, 2019 •

edited

Loading

gfyoung Dec 22, 2019

charlesdong1991 Dec 22, 2019

WillAyd Dec 23, 2019 •

edited

Loading

charlesdong1991 Dec 23, 2019

jreback Dec 24, 2019

charlesdong1991 Dec 24, 2019

jreback Dec 24, 2019

charlesdong1991 Dec 24, 2019

jreback Dec 24, 2019

charlesdong1991 Dec 24, 2019

jreback Dec 26, 2019

charlesdong1991 Dec 26, 2019

jreback Dec 26, 2019

charlesdong1991 Dec 26, 2019

WillAyd left a comment

jreback commented Dec 27, 2019

	If True, the resulting axis will be labeled 0, …, n - 1.
	If True, the resulting axis will be labeled 0, 1, …, n - 1.

		tm.assert_frame_equal(df, DataFrame(origin_dict))


		@pytest.mark.parametrize(

ENH: Add ignore_index for df.drop_duplicates #30405

ENH: Add ignore_index for df.drop_duplicates #30405

Conversation

charlesdong1991 commented Dec 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Dec 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Dec 27, 2019

charlesdong1991 commented Dec 22, 2019 •

edited

Loading

WillAyd Dec 23, 2019 •

edited

Loading