-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
## DO NOT MERGE. BUG: Fix .dropna() functionality for categorical indices #25091
Conversation
Rebase to master
Merge commit
Reverse merge-commit
Merge commit
Codecov Report
@@ Coverage Diff @@
## master #25091 +/- ##
===========================================
- Coverage 92.37% 42.89% -49.49%
===========================================
Files 166 166
Lines 52420 52420
===========================================
- Hits 48423 22483 -25940
- Misses 3997 29937 +25940
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25091 +/- ##
=======================================
Coverage 92.37% 92.37%
=======================================
Files 166 166
Lines 52420 52420
=======================================
Hits 48423 48423
Misses 3997 3997
Continue to review full report at Codecov.
|
Related unexpected behavior: # TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
'foo' in pd.Categorical(pd.interval_range(0.1, 0.2)) Works as expected for ints: 'foo' in pd.Categorical(pd.interval_range(42, 43))
|
@@ -4615,7 +4615,7 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None, | |||
else: | |||
raise TypeError('must specify how or thresh') | |||
|
|||
result = self.loc(axis=axis)[mask] | |||
result = self.loc(axis=axis)[mask.to_numpy()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need the alignment from the index anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger It turned out to be a bit wider issue. Kudos to @jorisvandenbossche for an insightful discussion. The behavior of an_index[mask]
, 'foo' in an_index
, an_index.get_loc(mask)
is currently inconsistent across various index types, in particular, Categorical(pd.interval_range(0.1, 3.14))
, Categorical(pd.interval_range(1, 2))
.
Does this require a separate issue, or I could just post repro snippets and outcomes as of 0.23.4
and 0.24.0
into #25087?
Closing for now given this hasn't been updated in a few weeks but ping if you'd like to reopen. The first and most important part of any PR is tests, so make sure you have that squared away first |
git diff upstream/master -u -- "*.py" | flake8 --diff
The last whatsnew is https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v0.24.0.rst