GH24241 make Categorical.map transform nans #24275

JustinZhengBC · 2018-12-13T21:27:36Z

closes Series.apply on categorical with NaN has wrong behavior #24241
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Alters Categorical.map so that the mapper function is also applied to NaN values. The mentioned bug report brings up the example of calling apply(pd.isna) on a categorical series, arguing that the values and dtype of the returned list should be consistent with non-categorical series. This PR makes the values consistent. The discrepancies in dtypes are present in categorical series without NaN's and consistent with documentation.

pep8speaks · 2018-12-13T21:27:39Z

Hello @JustinZhengBC! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/categorical.py !
There are no PEP8 issues in the file pandas/tests/indexes/test_category.py !

codecov · 2018-12-13T22:02:18Z

Codecov Report

Merging #24275 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24275      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51787    51798      +11     
==========================================
+ Hits        47761    47771      +10     
- Misses       4026     4027       +1

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.99% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.37% <100%> (+0.06%)`	⬆️
pandas/io/json/json.py	`92.61% <0%> (-0.48%)`	⬇️
pandas/util/testing.py	`87.51% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a3512...d765dc3. Read the comment docs.

codecov · 2018-12-13T22:02:19Z

Codecov Report

Merging #24275 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24275      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51787    51798      +11     
==========================================
+ Hits        47761    47771      +10     
- Misses       4026     4027       +1

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.99% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.37% <100%> (+0.06%)`	⬆️
pandas/io/json/json.py	`92.61% <0%> (-0.48%)`	⬇️
pandas/util/testing.py	`87.51% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a3512...d765dc3. Read the comment docs.

codecov · 2018-12-13T22:02:20Z

Codecov Report

Merging #24275 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24275      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51787    51798      +11     
==========================================
+ Hits        47761    47771      +10     
- Misses       4026     4027       +1

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.99% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.37% <100%> (+0.06%)`	⬆️
pandas/io/json/json.py	`92.61% <0%> (-0.48%)`	⬇️
pandas/util/testing.py	`87.51% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a3512...d765dc3. Read the comment docs.

codecov · 2018-12-13T22:02:20Z

Codecov Report

Merging #24275 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24275      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51787    51798      +11     
==========================================
+ Hits        47761    47771      +10     
- Misses       4026     4027       +1

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.99% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.37% <100%> (+0.06%)`	⬆️
pandas/io/json/json.py	`92.61% <0%> (-0.48%)`	⬇️
pandas/util/testing.py	`87.51% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a3512...d765dc3. Read the comment docs.

codecov · 2018-12-13T22:02:28Z

Codecov Report

Merging #24275 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24275      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51808    51834      +26     
==========================================
+ Hits        47814    47840      +26     
  Misses       3994     3994

Flag	Coverage Δ
#multiple	`90.7% <100%> (ø)`	⬆️
#single	`42.98% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.32% <100%> (+0.01%)`	⬆️
pandas/core/groupby/groupby.py	`96.65% <0%> (-0.03%)`	⬇️
pandas/util/testing.py	`87.57% <0%> (-0.01%)`	⬇️
pandas/core/generic.py	`96.66% <0%> (ø)`	⬆️
pandas/core/frame.py	`96.91% <0%> (ø)`	⬆️
pandas/core/series.py	`93.71% <0%> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.3% <0%> (+0.01%)`	⬆️
pandas/core/base.py	`97.66% <0%> (+0.01%)`	⬆️
pandas/core/window.py	`96.41% <0%> (+0.01%)`	⬆️
pandas/core/groupby/generic.py	`87.15% <0%> (+0.03%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6111f64...82859d9. Read the comment docs.

jreback · 2018-12-14T13:45:21Z

pandas/core/arrays/categorical.py

-            return self.from_codes(self._codes.copy(),
-                                   categories=new_categories,
-                                   ordered=self.ordered)
+            if isinstance(mapper, (dict, ABCSeries)):


why is this a special case? use is_dict_like

jreback · 2018-12-14T13:45:38Z

pandas/core/arrays/categorical.py

+        except (AttributeError, KeyError, TypeError, ValueError):
+            new_value = np.nan
+
+        try:


why are you try/except here?

how / why can this fail

AttributeError: if mapper calls a method of the element (e.g. lambda x: x.lower())
KeyError: if mapper is a dict without a key for NaN
TypeError: if mapper expects some type other than a float
ValueError: if mapper tries converting float values to ints (e.g. lambda x: int(x))

if you mean the try/except below that, that was already there. from_codes raises a ValueError if the mapping isn't one-to-one

jreback · 2018-12-14T13:46:20Z

pandas/tests/indexes/test_category.py

@@ -311,6 +311,37 @@ def test_map_with_categorical_series(self):
        exp = pd.Index(["odd", "even", "odd", np.nan])
        tm.assert_index_equal(a.map(c), exp)

+    @pytest.mark.parametrize('data, f', [[[1, 1, np.nan], pd.isna],


write this as

@pytest.mark.parametrize( 'data', 'f', [ ...... ])) .....```

jreback · 2018-12-14T13:46:27Z

pandas/tests/indexes/test_category.py

+            expected = pd.Index([False, False, True])
+            tm.assert_index_equal(result, expected)
+
+    @pytest.mark.parametrize('data, f', [[[1, 1, np.nan], {1: False}],


pandas/tests/indexes/test_category.py

TomAugspurger

Can you document the handling of missing values in the docstring?

pandas/core/arrays/categorical.py

TomAugspurger · 2018-12-17T02:40:03Z

pandas/core/arrays/categorical.py

+                new_value = mapper[np.nan]
+            else:
+                new_value = mapper(np.nan)
+        except (AttributeError, KeyError, TypeError, ValueError):


I'm not really comfortable with this. mapper is a user-defined function. Consider

def f(x): if isnan(x): raise TypeError ...

that TypeError would be swallowed by pandas.

pandas/core/arrays/categorical.py

JustinZhengBC · 2018-12-19T05:41:09Z

I decided to go with @TomAugspurger's suggestion to just change the documentation. I did leave in one change in the function, because previously calling np.take would cause NaN values (represented by -1 in self._codes) to take the last element of new_categories.

jreback

lgtm. just a small comment. ping on green.

pandas/core/arrays/categorical.py

JustinZhengBC · 2018-12-20T01:27:42Z

@jreback green

TomAugspurger · 2018-12-20T02:50:17Z

pandas/core/arrays/categorical.py

@@ -1234,6 +1234,11 @@ def map(self, mapper):
                                   categories=new_categories,
                                   ordered=self.ordered)
        except ValueError:
+            # NA values are represented in self._codes with -1
+            # np.take causes NA values to take final element in new_categories
+            if any(self._codes == -1):


Should this use np.any? I think any will short-circuit, but np.any will likely be faster.

TomAugspurger · 2018-12-20T14:24:58Z

Thanks @JustinZhengBC!

* BUG-24241 make Categorical.map transform nans

BUG-24241 make Categorical.map transform nans

d765dc3

jreback requested changes Dec 14, 2018

View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Categorical Categorical Data Type labels Dec 14, 2018

BUG-24241 make requested changes

628bfac

TomAugspurger reviewed Dec 17, 2018

View reviewed changes

pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved

TomAugspurger requested changes Dec 17, 2018

View reviewed changes

BUG-24241 update documentation instead

2011996

JustinZhengBC force-pushed the BUG-24241 branch from 590238d to 2011996 Compare December 19, 2018 05:40

TomAugspurger approved these changes Dec 19, 2018

View reviewed changes

jreback requested changes Dec 19, 2018

View reviewed changes

pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved

jreback added this to the 0.24.0 milestone Dec 19, 2018

BUG-24241 add comment

e5b5415

TomAugspurger reviewed Dec 20, 2018

View reviewed changes

BUG-24241 use np.any instead of any

82859d9

TomAugspurger merged commit ff69f45 into pandas-dev:master Dec 20, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

GH24241 make Categorical.map transform nans (pandas-dev#24275)

8510256

* BUG-24241 make Categorical.map transform nans

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

GH24241 make Categorical.map transform nans (pandas-dev#24275)

c9dfab3

* BUG-24241 make Categorical.map transform nans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH24241 make Categorical.map transform nans #24275

GH24241 make Categorical.map transform nans #24275

JustinZhengBC commented Dec 13, 2018 •

edited

Loading

pep8speaks commented Dec 13, 2018

codecov bot commented Dec 13, 2018

codecov bot commented Dec 13, 2018

codecov bot commented Dec 13, 2018 •

edited

Loading

codecov bot commented Dec 13, 2018

codecov bot commented Dec 13, 2018 •

edited

Loading

jreback Dec 14, 2018

jreback Dec 14, 2018

jreback Dec 14, 2018

JustinZhengBC Dec 15, 2018 •

edited

Loading

jreback Dec 14, 2018

jreback Dec 14, 2018

TomAugspurger left a comment

TomAugspurger Dec 17, 2018

JustinZhengBC commented Dec 19, 2018 •

edited

Loading

jreback left a comment

JustinZhengBC commented Dec 20, 2018

TomAugspurger Dec 20, 2018

TomAugspurger commented Dec 20, 2018

GH24241 make Categorical.map transform nans #24275

GH24241 make Categorical.map transform nans #24275

Conversation

JustinZhengBC commented Dec 13, 2018 • edited Loading

pep8speaks commented Dec 13, 2018

codecov bot commented Dec 13, 2018

Codecov Report

codecov bot commented Dec 13, 2018

Codecov Report

codecov bot commented Dec 13, 2018 • edited Loading

Codecov Report

codecov bot commented Dec 13, 2018

Codecov Report

codecov bot commented Dec 13, 2018 • edited Loading

Codecov Report

jreback Dec 14, 2018

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

JustinZhengBC Dec 15, 2018 • edited Loading

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Dec 17, 2018

Choose a reason for hiding this comment

JustinZhengBC commented Dec 19, 2018 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

JustinZhengBC commented Dec 20, 2018

TomAugspurger Dec 20, 2018

Choose a reason for hiding this comment

TomAugspurger commented Dec 20, 2018

JustinZhengBC commented Dec 13, 2018 •

edited

Loading

codecov bot commented Dec 13, 2018 •

edited

Loading

codecov bot commented Dec 13, 2018 •

edited

Loading

JustinZhengBC Dec 15, 2018 •

edited

Loading

JustinZhengBC commented Dec 19, 2018 •

edited

Loading