Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix passing of numeric_only argument for categorical reduce #25304

Merged
merged 4 commits into from
Feb 16, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Fixed Regressions
- Fixed regression in :meth:`DataFrame.apply` causing ``RecursionError`` when ``dict``-like classes were passed as argument. (:issue:`25196`)

- Fixed regression in :meth:`DataFrame.duplicated()`, where empty dataframe was not returning a boolean dtyped Series. (:issue:`25184`)
- Fixed regression in :meth:`Series.min` and :meth:`Series.max` where ``numeric_only=True`` was ignored when the ``Series`` contained ```Categorical`` data (:issue:`25299`)

.. _whatsnew_0242.enhancements:

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -2172,7 +2172,7 @@ def _reverse_indexer(self):
return result

# reduction ops #
def _reduce(self, name, axis=0, skipna=True, **kwargs):
def _reduce(self, name, axis=0, **kwargs):
func = getattr(self, name, None)
if func is None:
msg = 'Categorical cannot perform the operation {op}'
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3678,8 +3678,12 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
if axis is not None:
self._get_axis_number(axis)

# dispatch to ExtensionArray interface
if isinstance(delegate, ExtensionArray):
if isinstance(delegate, Categorical):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are u adding a code path here? the original is much more generic ; need to avoid special cases like this
if u need to handle this specially then the place is in the Categirical itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because Categorical deviates here from the standard ExtensionArray (see #25303 for the issue about that).
I personally find it clearer with this special case, making it explicit that Categorical has a different signature. And after the deprecation period, we can remove this special case.

If you feel strongly about it, it can indeed be handled inside Categorical. But that means that all the other arrays' _reduce method needs to be updated as well to handle (=ignore) numeric_only, which is also not clean (and the special case here is only temporarily anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not more clear and leads to future issues
pls move to _reduce

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why the others need to change at all you’re logic is circular

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would cause more changes, since the problem is actually that the argument numeric_only is not passed currently. If we would add it for every ExtensionArray call we get problems at the other reduction methods. For instance here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/arrays/numpy_.py#L322 (they don't have numeric_only, or **kwargs in the method definition).

So we could change the call for every ExtenensionArray to:

return delegate._reduce(name, skipna=skipna, numeric_only=numeric_only, **kwds)

but then we would need to make sure every child of ExtensionArray supports this and this is currently not the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnov explained it well. I think we don't want to change the EA interface (_reduce is an official part of it) just for this back-compat special case we are going to deprecate. Hence, categorical needs to be handled separately here (but again, this is only temporary)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree numeric_only is likely not going away anytime soon and even so
the EA simply need to accept it (they can ignore it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback this is not about the numeric_only in DataFrame/Series reductions that determines for which columns the reduction is calculated. This is another meaning of the keyword only for categorical that determines whether NaNs should be skipped or not. Please read #25303

So we are not speaking about removing that general use case of numeric_only, but only the one in Categorical.min/max.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i’ll look closer but am still -1 on any handling as a special case in the Series call
the point is that pass on kwargs; EA can ignore or not as required

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, to take a step back: @jreback do you agree that in the long term we can deprecate this numeric_only keyword for Categorical.min/max?
(it's the only EA that now uses it, while the others all use skipna for the same thing)

# TODO deprecate numeric_only argument for Categorical and use
# skipna as well, see GH25303
return delegate._reduce(name, numeric_only=numeric_only, **kwds)
elif isinstance(delegate, ExtensionArray):
# dispatch to ExtensionArray interface
return delegate._reduce(name, skipna=skipna, **kwds)
elif is_datetime64_dtype(delegate):
# use DatetimeIndex implementation to handle skipna correctly
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/reductions/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,18 @@ def test_min_max(self):
assert np.isnan(_min)
assert _max == 1

cat = Series(Categorical(
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
["a", "b", np.nan, "a"], categories=['b', 'a'], ordered=True))
_min = cat.min(numeric_only=True)
_max = cat.max(numeric_only=True)
assert _min == "b"
assert _max == "a"

_min = cat.min(numeric_only=False)
_max = cat.max(numeric_only=False)
assert np.isnan(_min)
assert _max == "a"


class TestSeriesMode(object):
# Note: the name TestSeriesMode indicates these tests
Expand Down