Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement arith ops on pd.Categorical #21213

Closed
wants to merge 1 commit into from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Then in core.ops we dispatch to pd.Categorical instead of special-casing.

@@ -1203,6 +1219,24 @@ def map(self, mapper):
__le__ = _cat_compare_op('__le__')
__ge__ = _cat_compare_op('__ge__')

__add__ = _cat_arithmetic_op(operator.add)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so my ops PR does almost exactly this (for extension arrays). I think we should maybe put it in a mixin? so can capture ops definitions generically (but of course can do that later).

Copy link
Member Author

@jbrockmendel jbrockmendel May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. For the most part I'm happy being hands-off with Categorical since others are more familiar with it. I figured this would be an easy way to demonstrate what I have in mind for discussions in #21160, #20889.

@codecov
Copy link

codecov bot commented May 26, 2018

Codecov Report

Merging #21213 into master will decrease coverage by <.01%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21213      +/-   ##
==========================================
- Coverage   91.84%   91.83%   -0.01%     
==========================================
  Files         153      153              
  Lines       49505    49534      +29     
==========================================
+ Hits        45466    45492      +26     
- Misses       4039     4042       +3
Flag Coverage Δ
#multiple 90.23% <90%> (-0.01%) ⬇️
#single 41.9% <76.66%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 96.21% <50%> (-0.15%) ⬇️
pandas/core/arrays/categorical.py 95.57% <92.85%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6abb61...93395fb. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it actually needed to add those?
If those are not defined, Python gives already almost the same error automatically.

In [20]: pd.Categorical([1, 2, 3]) + 1
...
TypeError: unsupported operand type(s) for +: 'Categorical' and 'int'

@jbrockmendel
Copy link
Member Author

Why is it actually needed to add those?

At the moment the behavior is implicitly defined in core.ops, which is very much not the right place.

@jorisvandenbossche
Copy link
Member

At the moment the behavior is implicitly defined in core.ops, which is very much not the right place.

I am not speaking about the change in ops.py, that is good, but just asking about the reasoning to add those methods to Categorical. Without adding those, python will also raise a TypeError (unless the right object knows how to handle it)

@jbrockmendel
Copy link
Member Author

just asking about the reasoning to add those methods to Categorical.

a) precedent/demonstration of the Game Plan for e.g. #21160, #20889.
b) explicit instead of implicit

@jorisvandenbossche
Copy link
Member

a) precedent/demonstration of the Game Plan for e.g. #21160, #20889.

Again, that's for the change in ops.py? (which I completely agree with)

b) explicit instead of implicit

but on the other hand, not defining the method or returning NotImplemented is, although implicit, the standard python idiom to handle this.

This path is already taken when eg the categorical series it the right object.

To be clear, there might be good other reasons to do this, eg because it would give changes in behaviour compared to what we currently have.

@jbrockmendel
Copy link
Member Author

Again, that's for the change in ops.py?

No, its for the methods themselves on the EA subclasses. #21160 already does this, so the "precedent" is more for 20889.

To be clear, there might be good other reasons to do this, eg because it would give changes in behaviour compared to what we currently have.

The PR will keep the current exception messages unchanged. Not a big deal.

Really if 21160 goes through it will set the relevant precedent for 20889 and other authors, at which point I don't especially care about this.

@jreback
Copy link
Contributor

jreback commented May 29, 2018

yes I think this will be much simpler / better after #21160

first PR should factor out the ops to a Mixin (I may do that), then integerate.

@jbrockmendel
Copy link
Member Author

Sounds good, closing. If you get a chance to look at #19959 after it goes through we can add tests for IntEA\pm datetime64

@Dr-Irv
Copy link
Contributor

Dr-Irv commented May 29, 2018

@jreback wrote:

first PR should factor out the ops to a Mixin (I may do that), then integerate.

I was going to move what I was doing with the ops in #20889 into a mixin. Should I work on this or let you create the mixin as you said you may do above?

Ref @jorisvandenbossche discussion at #20889 (comment)

@jorisvandenbossche
Copy link
Member

IMO you can already do this. Some of the changes to ops.py will be duplicated (but you can actually check Jeff's PR how he did it, only a small change), but the mixin being talked about here will not have this "scalar-ops-fallback", which is the point of the Mixin you want to create, I think?

@jbrockmendel jbrockmendel deleted the cats branch April 5, 2020 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants