WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918

ghost · 2019-06-18T07:17:23Z

This summarized my current understanding of how the dispatch of Series methods should work for EA (related #26913, #26730, and #26835) and It also provides helpful warnings against the several pitfalls I fell into while figuring things out. There are a lot more details involved then I foresaw.

The document is sloppy pseudo-sphinx for now (I haven't even compiled it yet), just to see if you agree with the content before spending the effort to polish it. I will, if/when the content represent a consensus instead of just my own take on the issue.

The description of how things work doesn't match the current state of the code right now, because there are a lot of little cleanups to be made in various functions:

Getting rid of com.values_for
Getting rid of _values and maybe even _data in some cases
Getting rid of redundant calls to np.asarray(self) or np.asarray(self._values) and such.

but I don't think there's actually any radical change implied. It's just about making methods consistent in how they delegate to numpy, so that EA developers can count on predictable behavior and so reason about how the pandas, numpy, and their own classes interact.

So I hope this will actually seem natural and undramatic, but I may have overlooked one or more vital issues.

One important realization I've come to is that the recent discussion regarding NEP18 was somewhat
of a red herring rabbit-hole. IMO, pandas should simply provide the explicit guarantee that some things go via _reduce (or a future __reduce_pandas__#26915), while the others go to np.<func>(array), and that the return value type should be an ExtensionArray when applicable. How the EA author deals with numpy's dispatch logic is nothing we need or should want to to speak out on. That's between the EA author and numpy.

Avoiding NEP18 altogether means that #26817, the round() PR, can be simplified to just a one line change to Series.round, and adding a new 4-line round method on DecimalArray as an example.
This would be the typical change needed to support (and provde examples) for other Series methods which are currently out of alignment. So going down the NEP18 (as suggested to me) seems like it was a misstep IMO, now we know.

Possible Concerns for future (not covered in doc):

How should we support EA for np operation which Series does not provide, like ceil (discussed in Series doesn't implement floor/ceil ops (+EA support needed) #26892). We may need to modify Series.apply with a special case for EA, so that s.apply(np.floor) works for EA. I'm not sure.
What if future operations are added to the Series namespace? If we ever add a new Series method, which is not a reduction (going via _reduce), and is not a convenience around some
numpy function (so that the guarantee mentioned above doesn't hold), we will need to have a new
mechanism in order to allow EA to support it. if this is a real concern, and there's no good answer, we might want to think about having a more general dispatching interface in place right now, instead of settling on the numpy dispatch guarantee mentioned, so as to be handle both existing and future operations.

As an imaginary example, if next year we add the Series.fnorb method, which fnorbs a numeric array using a new, cutting-edge, cython-optimized fnorbing algorithm, how do we then allow EA authors to provide an implementation for their own int ndarray backed EA?

cc @jorisvandenbossche, @TomAugspurger.

codecov · 2019-06-18T07:58:33Z

Codecov Report

Merging #26918 into master will decrease coverage by 50.77%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master   #26918       +/-   ##
===========================================
- Coverage   91.87%   41.09%   -50.78%     
===========================================
  Files         180      180               
  Lines       50712    50712               
===========================================
- Hits        46590    20841    -25749     
- Misses       4122    29871    +25749

Flag	Coverage Δ
#multiple	`?`
#single	`41.09% <ø> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/plotting/_matplotlib/__init__.py	`0% <0%> (-100%)`	⬇️
pandas/io/gcs.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/io/s3.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.37%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.16%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.1%)`	⬇️
... and 132 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa77c3...92d4b9b. Read the comment docs.

codecov · 2019-06-18T07:58:33Z

Codecov Report

Merging #26918 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26918      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.01%     
==========================================
  Files         180      180              
  Lines       50712    50746      +34     
==========================================
+ Hits        46590    46620      +30     
- Misses       4122     4126       +4

Flag	Coverage Δ
#multiple	`90.46% <ø> (ø)`	⬆️
#single	`41.09% <ø> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️
pandas/core/strings.py	`98.93% <0%> (ø)`	⬆️
pandas/core/indexes/interval.py	`96.44% <0%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.71% <0%> (ø)`	⬆️
pandas/core/internals/managers.py	`95.21% <0%> (ø)`	⬆️
pandas/compat/numpy/__init__.py	`93.1% <0%> (ø)`	⬆️
pandas/core/generic.py	`94.2% <0%> (ø)`	⬆️
pandas/core/series.py	`93.66% <0%> (+0.01%)`	⬆️
pandas/core/arrays/sparse.py	`93.73% <0%> (+0.02%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa77c3...1f689c1. Read the comment docs.

jorisvandenbossche · 2019-06-18T19:26:16Z

One important realization I've come to is that the recent discussion regarding NEP18 was somewhat
of a red herring rabbit-hole. IMO, pandas should simply provide the explicit guarantee that some things go via _reduce (or a future reduce_pandas#26915), while the others go to np.(array), ...How the EA author deals with numpy's dispatch logic is nothing we need or should want to to speak out on. That's between the EA author and numpy.

I agree that, from the pandas side, all we should expect is that the EA is handled by a numpy func and the EA can determine itself in which way it wants to be handled (for the non-ufuncs: only __arrray__, having a similar named method that is being dispatched to by numpy, or __array_function__).

But, that said, I think the exercise with __array_function__ is still very helpful and interesting, and we still might want to include a test case with that, as it seems, at this moment at least, the "right thing to do" given the direction numpy is taking (instead of the method dispatch)

jorisvandenbossche

Thanks for the write up. Didn't comment in detail yet, but I think it is certainly valuable to include this in some form (although it could probably be made a bit less verbose I think)

doc/source/development/extending.rst

datapythonista

Many typos, please review it yourself, and let us know when it's ready for someone else revision.

doc/source/development/extending.rst

ghost · 2019-06-19T02:26:48Z

xref #26935, motivated by how messy this is.

ghost · 2019-06-19T06:49:55Z

Thanks for the review, @datapythonista. I think I fixed most of the remaining typos too. Let me know if you have more suggestions.

jorisvandenbossche · 2019-06-19T08:07:43Z

The document is sloppy pseudo-sphinx for now (I haven't even compiled it yet), just to see if you agree with the content before spending the effort to polish it.

@pilkibun in this case, I find it perfectly reasonable to have such a draft PR to get early feedback.
But just a tip to make this more clear to potential reviewers: you can put WIP: or [WIP] in the beginning of the title.
And, github now also has a feature to make "draft" PRs (but this can only be done at the moment of opnening the PR, I think)

doc/source/development/extending.rst

TomAugspurger · 2019-06-19T11:56:09Z

doc/source/development/extending.rst

+    def round(self, decimals=0, **kwds):
+        pass
+
+An alternative approach to implementing individual functions, is to override


I wouldn't recommend this.

doc/source/development/extending.rst

ghost · 2019-06-19T18:04:54Z

Thanks for the detailed review @TomAugspurger, very helpful.

ghost · 2019-06-30T16:44:46Z

Pandas is moving in a more specific direction with this, and I don't have have a full picture of it. This already seems out of date. I'm closing, but I'll leave the branch in case someone wants to pick it up instead of starting from scratch.

pilkibun added 3 commits June 18, 2019 09:36

DOC/EA: developer docs for implementing Series.round/sum/etc in EA

0ea3548

DOC: describe one more approach

d137a10

DOC: add note about incremental implementation

92d4b9b

jorisvandenbossche reviewed Jun 18, 2019

View reviewed changes

doc/source/development/extending.rst Outdated Show resolved Hide resolved

datapythonista requested changes Jun 18, 2019

View reviewed changes

datapythonista added Docs ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 18, 2019

ghost mentioned this pull request Jun 19, 2019

A unified method dispatch protocol for ExtensionArrays #26935

Closed

Reference the right class

d1bf105

This was referenced Jun 19, 2019

Tracking issue for EA Series Operations Support #26913

Closed

Series doesn't implement floor/ceil ops (+EA support needed) #26892

Closed

pilkibun added 5 commits June 19, 2019 09:42

Fix review comment

81cd3cf

Fix review comment

cb3fd56

Fix review comment

c90c597

Fix review comment

ee3ad20

Fix more typos

901579e

TomAugspurger reviewed Jun 19, 2019

View reviewed changes

ghost mentioned this pull request Jun 19, 2019

Should Pandas adopt a naming convention for its protocol methods #26915

Closed

pilkibun added 6 commits June 19, 2019 19:47

Remove redundant note

1e253c7

Remove redundant note

696b320

snip

e70dbb9

typo

ab418c1

snip

d5db7e6

rephrase

d7ebacf

pilkibun added 6 commits June 19, 2019 20:26

snip

0fec603

typo

54b7822

Rearrange

d70ffec

typo

1296267

Snip

b8187b4

US

387fd68

ghost changed the title ~~DOC/EA: developer docs for implementing Series.round/sum/etc in EA~~ WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA Jun 19, 2019

pilkibun added 2 commits June 19, 2019 20:48

Remove explicit list

b800b67

Move sentence to a note

be69f0f

pilkibun added 8 commits June 20, 2019 06:38

cleanups

4d948fa

Rewrites

eef58bc

Rewrite

068ff61

Rephrase

4098825

whitespace

2d94b82

whitespace

5a9125b

cleanup

6866b66

reword

1f689c1

ghost closed this Jun 30, 2019

ghost deleted the doc_implemnting_EA_ops branch July 31, 2019 16:20

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918

WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918

ghost commented Jun 18, 2019

codecov bot commented Jun 18, 2019

codecov bot commented Jun 18, 2019 •

edited

Loading

jorisvandenbossche commented Jun 18, 2019 •

edited

Loading

jorisvandenbossche left a comment

datapythonista left a comment

ghost commented Jun 19, 2019

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019 •

edited

Loading

TomAugspurger Jun 19, 2019

ghost commented Jun 19, 2019

ghost commented Jun 30, 2019

WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918

WIP [DOC/EA]: developer docs for implementing Series.round/sum/etc in EA #26918

Conversation

ghost commented Jun 18, 2019

codecov bot commented Jun 18, 2019

Codecov Report

codecov bot commented Jun 18, 2019 • edited Loading

Codecov Report

jorisvandenbossche commented Jun 18, 2019 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

ghost commented Jun 19, 2019

ghost commented Jun 19, 2019

jorisvandenbossche commented Jun 19, 2019 • edited Loading

TomAugspurger Jun 19, 2019

Choose a reason for hiding this comment

ghost commented Jun 19, 2019

ghost commented Jun 30, 2019

codecov bot commented Jun 18, 2019 •

edited

Loading

jorisvandenbossche commented Jun 18, 2019 •

edited

Loading

jorisvandenbossche commented Jun 19, 2019 •

edited

Loading