-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH24241 make Categorical.map transform nans #24275
Conversation
Hello @JustinZhengBC! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24275 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51787 51798 +11
==========================================
+ Hits 47761 47771 +10
- Misses 4026 4027 +1
Continue to review full report at Codecov.
|
3 similar comments
Codecov Report
@@ Coverage Diff @@
## master #24275 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51787 51798 +11
==========================================
+ Hits 47761 47771 +10
- Misses 4026 4027 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24275 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51787 51798 +11
==========================================
+ Hits 47761 47771 +10
- Misses 4026 4027 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24275 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51787 51798 +11
==========================================
+ Hits 47761 47771 +10
- Misses 4026 4027 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24275 +/- ##
==========================================
+ Coverage 92.29% 92.29% +<.01%
==========================================
Files 162 162
Lines 51808 51834 +26
==========================================
+ Hits 47814 47840 +26
Misses 3994 3994
Continue to review full report at Codecov.
|
pandas/core/arrays/categorical.py
Outdated
return self.from_codes(self._codes.copy(), | ||
categories=new_categories, | ||
ordered=self.ordered) | ||
if isinstance(mapper, (dict, ABCSeries)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a special case? use is_dict_like
pandas/core/arrays/categorical.py
Outdated
except (AttributeError, KeyError, TypeError, ValueError): | ||
new_value = np.nan | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you try/except here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how / why can this fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeError: if mapper calls a method of the element (e.g. lambda x: x.lower()
)
KeyError: if mapper is a dict without a key for NaN
TypeError: if mapper expects some type other than a float
ValueError: if mapper tries converting float values to ints (e.g. lambda x: int(x)
)
if you mean the try/except below that, that was already there. from_codes
raises a ValueError if the mapping isn't one-to-one
@@ -311,6 +311,37 @@ def test_map_with_categorical_series(self): | |||
exp = pd.Index(["odd", "even", "odd", np.nan]) | |||
tm.assert_index_equal(a.map(c), exp) | |||
|
|||
@pytest.mark.parametrize('data, f', [[[1, 1, np.nan], pd.isna], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write this as
@pytest.mark.parametrize(
'data',
'f',
[
......
]))
.....```
expected = pd.Index([False, False, True]) | ||
tm.assert_index_equal(result, expected) | ||
|
||
@pytest.mark.parametrize('data, f', [[[1, 1, np.nan], {1: False}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the handling of missing values in the docstring?
pandas/core/arrays/categorical.py
Outdated
new_value = mapper[np.nan] | ||
else: | ||
new_value = mapper(np.nan) | ||
except (AttributeError, KeyError, TypeError, ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really comfortable with this. mapper
is a user-defined function. Consider
def f(x):
if isnan(x):
raise TypeError
...
that TypeError would be swallowed by pandas.
590238d
to
2011996
Compare
I decided to go with @TomAugspurger's suggestion to just change the documentation. I did leave in one change in the function, because previously calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just a small comment. ping on green.
@jreback green |
pandas/core/arrays/categorical.py
Outdated
@@ -1234,6 +1234,11 @@ def map(self, mapper): | |||
categories=new_categories, | |||
ordered=self.ordered) | |||
except ValueError: | |||
# NA values are represented in self._codes with -1 | |||
# np.take causes NA values to take final element in new_categories | |||
if any(self._codes == -1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use np.any
? I think any
will short-circuit, but np.any
will likely be faster.
Thanks @JustinZhengBC! |
* BUG-24241 make Categorical.map transform nans
* BUG-24241 make Categorical.map transform nans
git diff upstream/master -u -- "*.py" | flake8 --diff
Alters
Categorical.map
so that the mapper function is also applied to NaN values. The mentioned bug report brings up the example of callingapply(pd.isna)
on a categorical series, arguing that the values and dtype of the returned list should be consistent with non-categorical series. This PR makes the values consistent. The discrepancies in dtypes are present in categorical series without NaN's and consistent with documentation.