-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: drop_duplicates not raising KeyError on missing key #19730
BUG: drop_duplicates not raising KeyError on missing key #19730
Conversation
Fix #17879 introduced an error by iterating over the columns in the dataframe, not the columns in the subset. This meant that passing in a column name missing from the dataframe would no longer raise a `KeyError` like it had previously. This fix checks the subset first before pulling necessary columns from the dataframe, and raises the necessary `KeyError` when a given column doesn't exist. Fixes #19726
Codecov Report
@@ Coverage Diff @@
## master #19730 +/- ##
==========================================
+ Coverage 91.58% 91.58% +<.01%
==========================================
Files 150 150
Lines 48867 48890 +23
==========================================
+ Hits 44755 44777 +22
- Misses 4112 4113 +1
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
@@ -3655,6 +3655,10 @@ def f(vals): | |||
isinstance(subset, tuple) and subset in self.columns): | |||
subset = subset, | |||
|
|||
for name in subset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what you are checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do this
diff = pd.Index(subset).difference(self.columns)
if len(diff):
raise KeyError(diff)
can you also check |
Updated! |
thanks! |
Fix #17879 introduced an error by iterating over the columns in the dataframe, not the columns in the subset. This meant that passing in a column name missing from the dataframe would no longer raise a
KeyError
like it had previously.This fix checks the subset first before pulling necessary columns from the dataframe, and raises the necessary
KeyError
when a given column doesn't exist.