Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

## DO NOT MERGE. BUG: Fix .dropna() functionality for categorical indices #25091

Closed
wants to merge 7 commits into from

Conversation

rs2
Copy link
Contributor

@rs2 rs2 commented Feb 2, 2019

The last whatsnew is https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v0.24.0.rst

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #25091 into master will decrease coverage by 49.48%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25091       +/-   ##
===========================================
- Coverage   92.37%   42.89%   -49.49%     
===========================================
  Files         166      166               
  Lines       52420    52420               
===========================================
- Hits        48423    22483    -25940     
- Misses       3997    29937    +25940
Flag Coverage Δ
#multiple ?
#single 42.89% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 35.73% <100%> (-61.09%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb43726...e0c720d. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #25091 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25091   +/-   ##
=======================================
  Coverage   92.37%   92.37%           
=======================================
  Files         166      166           
  Lines       52420    52420           
=======================================
  Hits        48423    48423           
  Misses       3997     3997
Flag Coverage Δ
#multiple 90.79% <100%> (ø) ⬆️
#single 42.89% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 96.82% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb43726...e0c720d. Read the comment docs.

@rs2
Copy link
Contributor Author

rs2 commented Feb 2, 2019

Related unexpected behavior:

# TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
'foo' in pd.Categorical(pd.interval_range(0.1, 0.2))

Works as expected for ints:

'foo' in pd.Categorical(pd.interval_range(42, 43))

.get_loc() shuld not be raisin TypeError.

@rs2 rs2 changed the title #25087: Fix .dropna() functionality for categorical indices BUG: Fix .dropna() functionality for categorical indices Feb 2, 2019
@@ -4615,7 +4615,7 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None,
else:
raise TypeError('must specify how or thresh')

result = self.loc(axis=axis)[mask]
result = self.loc(axis=axis)[mask.to_numpy()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need the alignment from the index anymore?

Copy link
Contributor Author

@rs2 rs2 Feb 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger It turned out to be a bit wider issue. Kudos to @jorisvandenbossche for an insightful discussion. The behavior of an_index[mask], 'foo' in an_index, an_index.get_loc(mask) is currently inconsistent across various index types, in particular, Categorical(pd.interval_range(0.1, 3.14)), Categorical(pd.interval_range(1, 2)).

Does this require a separate issue, or I could just post repro snippets and outcomes as of 0.23.4 and 0.24.0 into #25087?

@rs2 rs2 changed the title BUG: Fix .dropna() functionality for categorical indices ## DO NOT MERGE BUG: Fix .dropna() functionality for categorical indices Feb 2, 2019
@rs2 rs2 changed the title ## DO NOT MERGE BUG: Fix .dropna() functionality for categorical indices ## DO NOT MERGE. BUG: Fix .dropna() functionality for categorical indices Feb 2, 2019
@WillAyd
Copy link
Member

WillAyd commented Feb 28, 2019

Closing for now given this hasn't been updated in a few weeks but ping if you'd like to reopen. The first and most important part of any PR is tests, so make sure you have that squared away first

@WillAyd WillAyd closed this Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in 0.24: TypeError exception when using dropna on dataframe with categorical index
3 participants