-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26918 +/- ##
===========================================
- Coverage 91.87% 41.09% -50.78%
===========================================
Files 180 180
Lines 50712 50712
===========================================
- Hits 46590 20841 -25749
- Misses 4122 29871 +25749
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26918 +/- ##
==========================================
- Coverage 91.87% 91.86% -0.01%
==========================================
Files 180 180
Lines 50712 50746 +34
==========================================
+ Hits 46590 46620 +30
- Misses 4122 4126 +4
Continue to review full report at Codecov.
|
I agree that, from the pandas side, all we should expect is that the EA is handled by a numpy func and the EA can determine itself in which way it wants to be handled (for the non-ufuncs: only But, that said, I think the exercise with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the write up. Didn't comment in detail yet, but I think it is certainly valuable to include this in some form (although it could probably be made a bit less verbose I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many typos, please review it yourself, and let us know when it's ready for someone else revision.
xref #26935, motivated by how messy this is. |
Thanks for the review, @datapythonista. I think I fixed most of the remaining typos too. Let me know if you have more suggestions. |
@pilkibun in this case, I find it perfectly reasonable to have such a draft PR to get early feedback. |
doc/source/development/extending.rst
Outdated
def round(self, decimals=0, **kwds): | ||
pass | ||
|
||
An alternative approach to implementing individual functions, is to override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't recommend this.
Thanks for the detailed review @TomAugspurger, very helpful. |
Pandas is moving in a more specific direction with this, and I don't have have a full picture of it. This already seems out of date. I'm closing, but I'll leave the branch in case someone wants to pick it up instead of starting from scratch. |
This summarized my current understanding of how the dispatch of Series methods should work for EA (related #26913, #26730, and #26835) and It also provides helpful warnings against the several pitfalls I fell into while figuring things out. There are a lot more details involved then I foresaw.
The document is sloppy pseudo-sphinx for now (I haven't even compiled it yet), just to see if you agree with the content before spending the effort to polish it. I will, if/when the content represent a consensus instead of just my own take on the issue.
The description of how things work doesn't match the current state of the code right now, because there are a lot of little cleanups to be made in various functions:
com.values_for
_values
and maybe even_data
in some casesnp.asarray(self)
ornp.asarray(self._values)
and such.but I don't think there's actually any radical change implied. It's just about making methods consistent in how they delegate to numpy, so that EA developers can count on predictable behavior and so reason about how the pandas, numpy, and their own classes interact.
So I hope this will actually seem natural and undramatic, but I may have overlooked one or more vital issues.
One important realization I've come to is that the recent discussion regarding NEP18 was somewhat
of a red herring rabbit-hole. IMO, pandas should simply provide the explicit guarantee that some things go via
_reduce
(or a future__reduce_pandas__
#26915), while the others go tonp.<func>(array)
, and that the return value type should be an ExtensionArray when applicable. How the EA author deals with numpy's dispatch logic is nothing we need or should want to to speak out on. That's between the EA author and numpy.Avoiding NEP18 altogether means that #26817, the
round()
PR, can be simplified to just a one line change toSeries.round
, and adding a new 4-lineround
method onDecimalArray
as an example.This would be the typical change needed to support (and provde examples) for other Series methods which are currently out of alignment. So going down the NEP18 (as suggested to me) seems like it was a misstep IMO, now we know.
Possible Concerns for future (not covered in doc):
ceil
(discussed in Series doesn't implement floor/ceil ops (+EA support needed) #26892). We may need to modifySeries.apply
with a special case for EA, so thats.apply(np.floor)
works for EA. I'm not sure._reduce
), and is not a convenience around somenumpy function (so that the guarantee mentioned above doesn't hold), we will need to have a new
mechanism in order to allow EA to support it. if this is a real concern, and there's no good answer, we might want to think about having a more general dispatching interface in place right now, instead of settling on the numpy dispatch guarantee mentioned, so as to be handle both existing and future operations.
As an imaginary example, if next year we add the
Series.fnorb
method, which fnorbs a numeric array using a new, cutting-edge, cython-optimized fnorbing algorithm, how do we then allow EA authors to provide an implementation for their own int ndarray backed EA?cc @jorisvandenbossche, @TomAugspurger.