BUG: drop_duplicates not raising KeyError on missing key #19730

NoahTheDuke · 2018-02-16T18:14:12Z

Fix #17879 introduced an error by iterating over the columns in the dataframe, not the columns in the subset. This meant that passing in a column name missing from the dataframe would no longer raise a KeyError like it had previously.

This fix checks the subset first before pulling necessary columns from the dataframe, and raises the necessary KeyError when a given column doesn't exist.

closes Pandas 0.22.0 does not raise KeyError for misspelled column with .drop_duplicates() #19726
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Fix #17879 introduced an error by iterating over the columns in the dataframe, not the columns in the subset. This meant that passing in a column name missing from the dataframe would no longer raise a `KeyError` like it had previously. This fix checks the subset first before pulling necessary columns from the dataframe, and raises the necessary `KeyError` when a given column doesn't exist. Fixes #19726

codecov · 2018-02-16T19:56:18Z

Codecov Report

Merging #19730 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19730      +/-   ##
==========================================
+ Coverage   91.58%   91.58%   +<.01%     
==========================================
  Files         150      150              
  Lines       48867    48890      +23     
==========================================
+ Hits        44755    44777      +22     
- Misses       4112     4113       +1

Flag	Coverage Δ
#multiple	`89.96% <100%> (ø)`	⬆️
#single	`41.79% <0%> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.23% <100%> (+0.07%)`	⬆️
pandas/core/series.py	`94.46% <0%> (-0.11%)`	⬇️
pandas/core/ops.py	`96.74% <0%> (-0.09%)`	⬇️
pandas/core/indexes/base.py	`96.45% <0%> (-0.02%)`	⬇️
pandas/plotting/_converter.py	`65.22% <0%> (ø)`	⬆️
pandas/core/panel.py	`97.3% <0%> (ø)`	⬆️
pandas/core/indexes/api.py	`98.78% <0%> (ø)`	⬆️
pandas/core/arrays/categorical.py	`94.9% <0%> (+0.01%)`	⬆️
pandas/core/indexes/category.py	`97.31% <0%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fdf1e2...61481a4. Read the comment docs.

jreback · 2018-02-18T16:17:47Z

pandas/core/frame.py

@@ -3655,6 +3655,10 @@ def f(vals):
              isinstance(subset, tuple) and subset in self.columns):
            subset = subset,

+        for name in subset:


can you add a comment here on what you are checking

you can do this

diff = pd.Index(subset).difference(self.columns) if len(diff): raise KeyError(diff)

jreback · 2018-02-18T16:20:41Z

can you also check .duplicated() (and add a test)

NoahTheDuke · 2018-02-21T02:03:51Z

Updated!

jreback · 2018-02-21T11:40:24Z

thanks!

…19730)

NoahTheDuke added 3 commits February 16, 2018 13:01

Adds test

7f8ee79

Adds changes to whatsnew

df31f09

jreback requested changes Feb 18, 2018

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Feb 18, 2018

NoahTheDuke added 2 commits February 20, 2018 14:08

Adds tests, fixes diff check.

69e3fd8

Use Index's built-in empty check

61481a4

jreback added this to the 0.23.0 milestone Feb 21, 2018

jreback approved these changes Feb 21, 2018

View reviewed changes

jreback merged commit dbc601e into pandas-dev:master Feb 21, 2018

NoahTheDuke deleted the bugfix-drop_duplicates-when-column-name-misspelled branch February 21, 2018 14:16

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

BUG: drop_duplicates not raising KeyError on missing key (pandas-dev#…

440fc8d

…19730)

TomAugspurger mentioned this pull request Apr 30, 2018

drop_duplicates on non-existent column should raise warning #20887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: drop_duplicates not raising KeyError on missing key #19730

BUG: drop_duplicates not raising KeyError on missing key #19730

NoahTheDuke commented Feb 16, 2018 •

edited

Loading

codecov bot commented Feb 16, 2018 •

edited

Loading

jreback Feb 18, 2018

jreback Feb 18, 2018

jreback commented Feb 18, 2018

NoahTheDuke commented Feb 21, 2018

jreback commented Feb 21, 2018

BUG: drop_duplicates not raising KeyError on missing key #19730

BUG: drop_duplicates not raising KeyError on missing key #19730

Conversation

NoahTheDuke commented Feb 16, 2018 • edited Loading

codecov bot commented Feb 16, 2018 • edited Loading

Codecov Report

jreback Feb 18, 2018

Choose a reason for hiding this comment

jreback Feb 18, 2018

Choose a reason for hiding this comment

jreback commented Feb 18, 2018

NoahTheDuke commented Feb 21, 2018

jreback commented Feb 21, 2018

NoahTheDuke commented Feb 16, 2018 •

edited

Loading

codecov bot commented Feb 16, 2018 •

edited

Loading