-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: divmod return type #22932
BUG: divmod return type #22932
Conversation
Hello @TomAugspurger! Thanks for submitting the PR.
|
pandas/tests/extension/test_ops.py
Outdated
@@ -0,0 +1,22 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a good place for generic ops tests. Am I missing someplace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean one that is not supposed to be subclassed by the actual EA implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean things that are testing pandas dispatching, not something in pandas (so not in tests/arrays) and not something that's part of the EA interface (so not part of tests/extension/base).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not something that's part of the EA interface (so not part of tests/extension/base).
OK, but basically we have been testing the ScalarOpsMixin through its use in DecimalArray I think, so you could see this one also as such a test.
Codecov Report
@@ Coverage Diff @@
## master #22932 +/- ##
==========================================
- Coverage 92.2% 92.19% -0.01%
==========================================
Files 169 169
Lines 50822 50837 +15
==========================================
+ Hits 46858 46870 +12
- Misses 3964 3967 +3
Continue to review full report at Codecov.
|
FYI, this, and the following are holding up PeriodArray a bit, so if I could get a quick +/- 1 on
it'd be appreciated. (cc @jbrockmendel for this one in particular.) |
doc/source/extending.rst
Outdated
result of the element-wise operation. Whether or not that succeeds depends on | ||
whether the operation returns a result that's valid for the ``ExtensionArray``. | ||
If an ``ExtensionArray`` cannot be reconstructed, a list containing the scalars | ||
returned instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit strange we return a list and not an array ?
(but that's maybe off topic for this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I found that strange as well. An array would be better to return.
This is new in 0.24 right? If so, I'll just make the change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have another PR touching about the same place right now, so I'm going to hold off on changing that till later.
pandas/tests/extension/test_ops.py
Outdated
@@ -0,0 +1,22 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean one that is not supposed to be subclassed by the actual EA implementations?
Just a question, I would expect that Period does not need the ScalarOpsMixin? (it has its own vectorized implementations?) |
Hmm, yes you're right. I'll look on my other branch to see if / why I was hitting this... |
pandas/core/arrays/base.py
Outdated
pass | ||
if op.__name__ in {'divmod', 'rdivmod'}: | ||
try: | ||
a, b = zip(*res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the zip-star necessary? If we get here shouldn't res
just be a 2-tuple? so a, b = res
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now res
is a list of tuples, so I think we have to split each of those with a zip(*)
> /Users/taugspurger/sandbox/pandas/pandas/core/arrays/base.py(781)_binop()
779 try:
780 import pdb; pdb.set_trace()
--> 781 a, b = zip(*res)
782 res = (self._from_sequence(a),
783 self._from_sequence(b))
ipdb> res
[(Decimal('0'), Decimal('1')), (Decimal('1'), Decimal('0')), (Decimal('1'), Decimal('1')), (Decimal('2'), Decimal('0'))]
ipdb> n
> /Users/taugspurger/sandbox/pandas/pandas/core/arrays/base.py(782)_binop()
780 import pdb; pdb.set_trace()
781 a, b = zip(*res)
--> 782 res = (self._from_sequence(a),
783 self._from_sequence(b))
784 except TypeError:
ipdb> p a, b
((Decimal('0'), Decimal('1'), Decimal('1'), Decimal('2')), (Decimal('1'), Decimal('0'), Decimal('1'), Decimal('0')))
This may be over-optimizing for code-sharing, but it might be worth updating
Now that I look at it, it looks like |
Can you add a test that does Pending that: LGTM. |
I'll make those fixes to ops.dispatch_to_extension_op here. Will see what we can share. |
Added a test with
It seems correct on master, maybe someone (probably you) fixed it in the meantime?
Perhaps leave for #22974, as I think there are some details to work out. I haven't put any tests in place for |
Sounds good. |
Updated.
|
(True, [2, 1, 0, 0], [0, 0, 2, 2]), | ||
]) | ||
def test_divmod_array(self, reverse, expected_div, expected_mod): | ||
# https://github.com/pandas-dev/pandas/issues/22930 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to better indicate that this is an "extra" test (not overriding a base one)?
(not necessarily needs to be solved here, but question is relevant in general, as we sometimes add tests to eg decimal to test specific aspects not covered in the base tests, to clearly see which those tests are. Maybe we would put them outside the class?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it off the class makes the most sense. Moved.
res = np.asarray(arr) | ||
return res | ||
|
||
if op.__name__ in {'divmod', 'rdivmod'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel can't we use dispatch_to_extension_op here to avoid duplication of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked something similar a few days ago. If Tom says it isn't feasible, I believe him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are inside the extension array here, so it would also be strange to use (which doesn't prevent that both could share a helper function, if that would be appropriate).
But here we need to construct the divmod correctly, while dispatch_to_extension_op should assume this is already done correctly by the EA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which doesn't prevent that both could share a helper function,
Right. This is possible, if people want it. I'll push up a commit with some kind of do_extension_op
that both of these call to so people can take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that I take a look it's not so straightforward. The two are similar but just slightly different in enough places that they wouldn't benefit from sharing code really.
- The unboxing of values.
dispatch_to_extension_op
knows that at least one of the two is a Series[EA]._binop
knows thatself
is an EA. - The op:
dispatch_to_extension_op
dispatches,_binop
is defining it in a list comprehension - The re-boxing:
_binop
has the whole maybe re-constructing_from_seqence
that thedispatch_to_extension_op
doesn't have to worry about at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not do "conceptually exactly the same thing". Paraphrasing myself from above:
The dispatch function calls the EA to perform an operation, the above code is the EA doing the operation.
Why would those two different things necessarily need to live in the same place / code path?
Of course, we could still move the whole EA._create_method
to ops.py
(which would indeed be similar as functions like add_flex_arithmetic_methods
in ops.py that is used in series.py
to add methods to Series). But this is then not related to the change in this PR, and should be left for another issue/PR to discuss (personally I don't think that would be an improvement).
I would see if its is possible to integrate these rather than adding a bunch of new code.
Well, and both Tom and me who have looked into the code, say: we don't think it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I saw your comment. I also @jorisvandenbossche comments. I have not looked at this in detail, nor do I have time to. My point is that this instantly creates technical debt no matter how you slice it.
It may require some reorganization to integrate this, and I appreciate that. So happy to defer this, maybe @jbrockmendel has more insight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be doing a sparse-de-duplication PR following #22880, can take a fresh look at this then. In the interim, I wouldn't let this issue hold up this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that this instantly creates technical debt no matter how you slice it.
It really doesn't. They're doing two different things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is fixing a bug, not doing a refactor of how the ops on EA's are implemented** . If somebody want to look into that, it should be done in a separate PR anyway. So merging.
** and I fully acknowledge that sometimes, to properly fix a bug, you also need to refactor otherwise you just keep adding hacks. However, I don't think that is the case here, see all the comments above.
Good to merge this then? |
Closes #22930