Categorical Arithmetic/Comparison Rules #19513

jbrockmendel · 2018-02-02T17:44:31Z

Series[not-categorical] > CategoricalIndex is inconsistent with the reversed operation. Which one is canonical?

ser = pd.Series([1, 2, 3])
idx = pd.CategoricalIndex(['A', 'B', 'A'])

>>> ser > idx
0    False
1    False
2    False

>>> idx < ser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/indexes/category.py", line 752, in _evaluate_compare
    return getattr(self.values, opname)(other)
  File "pandas/core/arrays/categorical.py", line 56, in f
    raise TypeError("Unordered Categoricals can only compare "
TypeError: Unordered Categoricals can only compare equality or not

>>> ser > idx.values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 819, in wrapper
    .format(op=op, typ=self.dtype))
TypeError: Cannot compare a Categorical for op <built-in function gt> with Series of dtype int64.
If you want to compare values, use 'series <op> np.asarray(other)'.

I'm guessing the right thing to do is to a) have Series[categorical].__op__ wrap CategoricalIndex.__op__, and b) have Series[non-categorical] to dispatch to reversed-op for is_categorical_dtype(other), want to confirm before making a PR.

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2018-02-03T01:18:08Z

A case derived from tests.categorical.test_operators.TestCategoricalOps.test_comparisons, where it has the comment

# categorical cannot be compared to Series or numpy array, and also
# not the other way around

cat = Series(Categorical([1, 2, 3], ordered=True))
s = Series([2, 2, 2])

>>> s < cat
Traceback (most recent call last):
[...]
TypeError: Cannot compare a Categorical for op __lt__ with type <type 'numpy.ndarray'>.
If you want to compare values, use 'np.asarray(cat) <op> other'.

>>> s < pd.CategoricalIndex(cat)
0    False
1    False
2     True
dtype: bool

jreback · 2018-02-06T11:14:03Z

there are some issues about this already. see if you can find them.

jbrockmendel · 2018-02-06T16:41:19Z

Closest issues with categorical label are #18050, #8995, neither of which seem to differentiate between the behavior of Categorical.__op__ vs CategoricalIndex.__op__ vs Series[category].__op__

@TomAugspurger any thoughts on the One True Implementation for these operations?

TomAugspurger · 2018-02-06T16:58:40Z

No strong thoughts, but I suppose a decent heuristic is "what would the result be if NumPy had a categorical dtype?" In that case, let's just do the same as Series[int].op(Index[int]) and Series[int].op(ndarray[int]), etc. So

self	other	return
series	ndarray	series
series	index	series
index	series	ndarray / extension array
index	series	ndarray / extension array
ndarray	series	series
ndarray	index	ndarray / extension array

Does that make sense?

jbrockmendel · 2018-02-06T18:12:07Z

I should clarify. The question isn't what the return type should be, but which operations are allowed at all.

The easy (i.e. pretty obviously wrong ATM) part is cases where x < y is allowed but y > x raises:

cat = pd.Categorical([1, 2, 3], ordered=True)
idx = pd.CategoricalIndex(cat)
ser = pd.Series(cat)

>>> cat < ser   # raises TypeError
>>> ser > cat   # returns Series
>>> cat < idx   # raises TypeError
>>> idx > cat   # returns array
>>> idx < ser   # raises TypeError
>>> ser > idx   # raises TypeError

The less obvious part is what casting rules are allowed. ATM CategoricalIndex.__op__ is much more forgiving about input dtypes and will cast other before dispatching to Categorical.__op__.

>>> ser < ser.astype('i8')   # raises TypeError
>>> idx < idx.astype('i8')   # returns array
>>> cat < cat.astype('i8')   # raises TypeError

>>> ser == ser.astype('i8')  # returns Series
>>> idx == idx.astype('i8')   # returns array
>>> cat == cat.astype('i8')  # returns array

The goal here is to have one implementation of each of these methods (ideally in Categorical) and then have the other classes wrap that instead of re-implementing and risking having the logic diverge.

jreback · 2018-02-10T17:43:21Z

Categorical is prob a bit strict about these checks ATM. Categorial and CI should behave exactly the same.

These should all (Series, Cat, CI) should all respect ordered (IOW they have to be the same). for comparison purposes.

@jschendel

jbrockmendel · 2018-02-20T21:04:16Z

It looks like Series.__cmp__(CategoricalIndex) effectively calls other = np.asarray(other) on the CategoricalIndex, which I expect isn't the desired behavior.

jbrockmendel · 2018-03-10T00:50:53Z

One more:

What should (DatetimeIndex|TimedeltaIndex|PeriodIndex).__(add|sub)__(Categorical|CategoricalIndex) do? I think right now they return NotImplemented which seems reasonable, but they do it in a catch-all block instead of intentionally and I don't think the case is tested.

@jschendel are you the appropriate person to ping on this?

jschendel · 2018-03-10T20:52:26Z

@jschendel are you the appropriate person to ping on this?

I don't think I have any type of definitive say on this; mostly worked on categorical vs. categorical comparisons.

I agree that this probably shouldn't be supported. I suppose you could perform the operation on the underlying category values, but it seems like you'd run into ambiguous corner cases pretty quickly. For example, would you actually want to allow add/sub if the categories were ordered with a non-standard ordering, e.g. Timestamp('2017-01-01') > Timestamp('2018-01-01') ? Not sure why anyone would ever do that, but could conceivably happen.

Also doesn't look like add/sub at the scalar level is implemented for Timestamp|Timedelta|Period vs. Categorical|CategoricalIndex, which I would expect to be implemented if we were to support the same operations on the index equivalent.

mroeschke · 2021-06-18T05:00:54Z

The comparison in the first comment all raise the same error now. Could use a test if there are none.

In [3]: ser > idx
TypeError: Unordered Categoricals can only compare equality or not

In [4]: idx < ser
TypeError: Unordered Categoricals can only compare equality or not

In [5]: ser > idx.values
TypeError: Unordered Categoricals can only compare equality or not

jbrockmendel · 2021-12-21T21:00:13Z

Closing as fixed.

This was referenced Feb 3, 2018

Dispatch Series comparison ops to DatetimeIndex and TimedeltaIndex #19524

Closed

Such thing as a scalar Categorical? #19523

Closed

This was referenced Feb 7, 2018

DataFrame vs Series vs Index arithmetic Roundup #18824

Closed

Dispatch categorical Series ops to Categorical #19582

Merged

jreback added API Design Categorical Categorical Data Type labels Feb 10, 2018

jbrockmendel mentioned this issue May 23, 2018

ENH: Support operators for ExtensionArray #20889

Closed

4 tasks

jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 23, 2019

jbrockmendel mentioned this issue Aug 6, 2019

REF: Make CategoricalIndex comparison defer to Categorical comparison #27769

Merged

jschendel mentioned this issue Jan 4, 2020

BUG: Datetimelike equality comparisons with Categorical #30699

Closed

mroeschke added the Bug label Jun 28, 2020

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Bug Categorical Categorical Data Type Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 18, 2021

jbrockmendel closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical Arithmetic/Comparison Rules #19513

Categorical Arithmetic/Comparison Rules #19513

jbrockmendel commented Feb 2, 2018

jbrockmendel commented Feb 3, 2018

jreback commented Feb 6, 2018

jbrockmendel commented Feb 6, 2018

TomAugspurger commented Feb 6, 2018 •

edited

Loading

jbrockmendel commented Feb 6, 2018

jreback commented Feb 10, 2018

jbrockmendel commented Feb 20, 2018

jbrockmendel commented Mar 10, 2018

jschendel commented Mar 10, 2018

mroeschke commented Jun 18, 2021

jbrockmendel commented Dec 21, 2021

Categorical Arithmetic/Comparison Rules #19513

Categorical Arithmetic/Comparison Rules #19513

Comments

jbrockmendel commented Feb 2, 2018

jbrockmendel commented Feb 3, 2018

jreback commented Feb 6, 2018

jbrockmendel commented Feb 6, 2018

TomAugspurger commented Feb 6, 2018 • edited Loading

jbrockmendel commented Feb 6, 2018

jreback commented Feb 10, 2018

jbrockmendel commented Feb 20, 2018

jbrockmendel commented Mar 10, 2018

jschendel commented Mar 10, 2018

mroeschke commented Jun 18, 2021

jbrockmendel commented Dec 21, 2021

TomAugspurger commented Feb 6, 2018 •

edited

Loading