Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix series.round() handling of EA #26817

Closed
wants to merge 9 commits into from
Closed

BUG: Fix series.round() handling of EA #26817

wants to merge 9 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 12, 2019

Note that the change to a np.round(self.array) call was chosen so that

  • it works for numeric ndarrays
  • it works for EA's which implements __array__ and return a numeric ndarray
  • once numpy 1.17 is out and NEP-18 support is enabled by default, it will enable
    EA authors to implement round functionality via the __array_function__ protocol.

Update:

This has become a test case on supporting a wider range of numpy functions for EA.
Here a some of the issues encounterd, several are indicative of the issues EA authors will encounter when trying to do the same:

  1. Calling numpy function may require a roundtrip through an object dtype, so results need to be
    cast to the extension type. Example: np.repeat
  2. Converting EA to a numpy array, if using NEP-18, means inspecting all the arguments looking for EA passed in, once you find them you need to coerce them, they be an EA of a different class, so you only know about them what's available through the EA protocol. Example: pd/np.concatenate
  3. ExtentionType discovery: discovering whether an EA has is essentially numeric, retrieving that representation, finding the best common dtype for mixed dtypes. Your EA may need to intelligently coerce an EA of another dtype. and cast back the result to the best dtype. Example: pd/np.concatenate
  4. Once you use NEP-18, numpy ignores __array__() even if available, so there's no transparent casting to object.
    5.ExtensionArray ABC doesn't offer a public API for requesting the backing numpy array of an EA, explicitly asking for numeric version (if applicable), or object dtype. hgrecco/pint-pandas implements a _ndarray_values method for PintArray, because you always need one (just as Series has .values and now .array), but it's not part of the EA base class and, for mixed-dtype cases, you need such a method when you dealing with a foreign EA.
  5. The Decimal EA example defines a whole slew of synonyms for self._data, data, _values, _items for "pandas compatibility", these attributes have ill-defined or undefined semantics, leading to the same confusing situation () of Series having .values _data and ,.get_values, .data, and now .array. Seems like repeating old mistakes again.
  6. If such aliases are necessary for pandas to recognize, they should defined and made part of the ABC.
  7. aliases should not be attributes self.foo=self._foo = values, because then assigning to one does not update the other, and its not clear where the "actual" data is. Instead should use properties which lookup from the single, true, attribute which holds the data. I had a bug related to this because I set _data on a copy instead of calling a constructor. safety first.
  8. another case: suppose you wish to convert the dtype of the EA's underlying numeric ndarray from float to int. there is no transform like method provided for the user to do that without reaching in and touching private attributes of the EA. Example, a PintArray with units, which you want to round() and convert to internal dtype. The ExtensionType doesn't change, but the underlying numeric subtype does. For example pint[grams] (with float64 magnitudes) -> pint[grams] (with int64 magnitudes) .

Update:

  1. EA authors are forces to implement 3 (!) separate dispatch function to handle common functions. __array_ufunc__ for ops np.floor, 'array_functionfor ops likeround, and _reducefor ops like sum/min/max, which pandas doesn't call numpy fo (instead callingpd.core.nanops`), and has incompatible signatures with.

@ghost ghost changed the title Series round use self array BUG: use self.array in series.count()/round() for EA support Jun 12, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always add tests first.

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jun 12, 2019
@codecov
Copy link

codecov bot commented Jun 12, 2019

Codecov Report

Merging #26817 into master will decrease coverage by 50.73%.
The diff coverage is 50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #26817       +/-   ##
===========================================
- Coverage   91.86%   41.12%   -50.74%     
===========================================
  Files         179      179               
  Lines       50707    50707               
===========================================
- Hits        46583    20855    -25728     
- Misses       4124    29852    +25728
Flag Coverage Δ
#multiple ?
#single 41.12% <50%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 45.23% <50%> (-48.42%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/plotting/_matplotlib/__init__.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/io/s3.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.37%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.16%) ⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 634577e...0946208. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 12, 2019

Codecov Report

Merging #26817 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26817      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.01%     
==========================================
  Files         180      180              
  Lines       50712    50721       +9     
==========================================
+ Hits        46590    46595       +5     
- Misses       4122     4126       +4
Flag Coverage Δ
#multiple 90.45% <100%> (ø) ⬆️
#single 41.1% <92.85%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/util/_test_decorators.py 93.75% <100%> (+0.52%) ⬆️
pandas/core/series.py 93.64% <100%> (ø) ⬆️
pandas/compat/numpy/__init__.py 93.93% <100%> (+0.83%) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa77c3...ffabac2. Read the comment docs.

@jorisvandenbossche
Copy link
Member

For tests, I am not fully sure what is the best approach. You could add a __array_function__ to one of the test ExtensionArrays we have (see tests/extension and then we have a decimal, json). Eg for decimal, you should be able to easily implement a np.round, since round(Decimal) (standard library) works.

@ghost ghost changed the title BUG: use self.array in series.count()/round() for EA support BUG: Fix series.round() for EA support Jun 13, 2019
@ghost ghost changed the title BUG: Fix series.round() for EA support BUG: Fix series.round() handling of EA Jun 13, 2019
@ghost
Copy link
Author

ghost commented Jun 13, 2019

I made some progress with this but, as discussed in #26730, introducing __array_function__ breaks lots of existing test breaking because _reduce is now ignored.

@jorisvandenbossche
Copy link
Member

Can you give a bit more details?

Trying to understand: for reductions, we currently define _reduce on the EA, but this will somewhat duplicate the __array_function__, as this also ensures that things like np.sum(..) works (however, _reduce has the skipna keyword, so they are not equivalent).
But, from Series ops, we call this EA._reduce directly, so how does it get ignored?

BTW, feel free to already push some WIP code, or point to your attempts of a pint array with __array_function__ if you are trying that. Speaking about actual code makes it always easier.

@pep8speaks
Copy link

pep8speaks commented Jun 13, 2019

Hello @pilkibun! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-17 12:53:02 UTC

@ghost
Copy link
Author

ghost commented Jun 13, 2019

pytest andas/tests/extension/decimal/test_decimal.py now gives only one failed test, but its an example of how introducing __array_function__ forces you to deal with stuff you really don't care about.

The failing test now is TestReshaping.test_concat_mixed_dtypes, that eventually calls np.concatenate, which now invokes __array_function__ (I wanted to override round and now concat tests are failing, already a bad indicator)

In general, for each function in numpy, I have to either implement the function myself, or do any casts necessary and delegate to numpy.

np.concatenate takes an iterable of array-likes. So for this to work, I need to convert any EA in the input array to a dtype numpy recognizes. But that means I now have to look at the arguments of every function, for every function! , and figure out which are EA, and how to intelligently coerce them. I can use the .data attribute on extension arrays (part of the interface, I think), but perhaps it makes sense for some to cast to numeric and others to object, and even without that, this is awful because some functions take one or n arrays, and others take a collection of arrays and for all I know, some take a nested collection of arrays.

It feels incredibly wrong to have to write the logic to handle this, just to override the implementation of round().

cc @shoyer, is this really what NEP-18 expects devs to do?

@ghost
Copy link
Author

ghost commented Jun 13, 2019

Here's another quandry: np.repeat. I don't want to reimplement the logic for repeat, but I can't call to numpy without casting to object, which means the resulting array loses the extension type, so I have to coerce back, so If I want to implement round for my EA, I also have to add logic to handle np.repeat.

That's crazy.

@jorisvandenbossche
Copy link
Member

There has recently been a lot of discussion on the numpy mailing list about exactly this (a way to only implement part of the functions, and fall back to the "raw" numpy implementation otherwise). At some point there was a proposal to have a __skip_array_function__ to deal with that, but in the end that was not included. See numpy/numpy#13624

@shoyer your update to the NEP says "We considered three possible ways to resolve this issue, but none were entirely satisfactory". Do you have any recommendation to do this nevertheless? (there is a private __wrapped__ that could be used for now on your own risk, at least for testing?)

@jorisvandenbossche
Copy link
Member

@pilkibun From the mailing list discussion, it seems there is a __wrapped__ attribute on the numpy function (although that is maybe only available in the dev version, didn't check), you could use that for the fallback, at least to test the approach on the short term (it's not really a long term solution though ..)

@ghost
Copy link
Author

ghost commented Jun 13, 2019

I added an unpleasent traversal of all the numpy function arguments, coercing any EA via .data. It feels
ugly, but it passes test_decimal. and now no longer requires an explicit implementation of round.

If acceptable, it's generic enough to perhaps consider including as a default implementation in the ExtensionArray base class.

@ghost
Copy link
Author

ghost commented Jun 14, 2019

Other than the checks build, I think the test suite passes now. The test I added is numpy ver-limited to the np_dev build, because only 1.17 has __array_function__ support on by default.

That said, I don't find this approach very felicitous, because:

  1. we're forced to recursively search all numpy function arguments pass in search of non-numpy dtypes in order to coerce them to object array.
  2. It looks recursivelt inside tuples and lists, but I haven't added support for dict arguments with EA in the values for example (I'm not sure numpy has any), because the whole thing smells.
  3. It only coerces its EA of its own type to object, right now, because I haven't found a canned method in the EA machinery for checking if a dtype is numpy-compatible or not (easily fixable, for sure).
    So currently, operations like pd.concat over mixed dtypes of non-numpy-dtype EA probably fail.
  4. Since any trip to numpy forces a cast to object, I've had to implement a a whitelist for functions (just np.repeat for now) which need to be cast back. I've seen pandas do this elsewhere , but its not very nice, and there are almost certainly others (np.tile?) that should be in the list, which means a long time before people file issues to fix this, and someone actually does. bugs galore.
  5. It feels like something with a long tail of unforseen special cases which will need to be dealt with, because __array_function__ requires that we deal with the entire numpy API. Rmember, DecimalArray is about as simple as you can get.
  6. introducing __array_function__ to the class may break currently working functionality. I just don't know.
  7. The whole point of using NEP-18 was to avoid implementing a bunch of methods on the EA class itself, so we use np.round(array) instead of self.array.round(), but if we can't confidently write a
    generic catch-all implementation of __array_function__ which deals with everything we want, and/or if need to cram so many special cases into __array_function__ that it entails as much boilerplate as just writing out individual methods, what's the benefit?

So I'm unhappy with this, even though it seemingly passes the tests.
The alternative approach, of forcing EA authors to implement every method they care about manually seems less error-prone, but just as onerous. I'm not sure there's a third way, but in any case whatever the choice it will become a public API and EA authors will base their work on it, so it won't be easy to undo. Not a happy situation.

I don't know how to move forward with this. Thoughts?

@ghost
Copy link
Author

ghost commented Jun 15, 2019

I updates the first post with some of the issued I encountered writing this, I think their typical problems likely to be encountered by any EA authors, so may be useful for refining the EA base classes/examples/documentation.

@jreback
Copy link
Contributor

jreback commented Jun 16, 2019

unlikely to merge this with all of these work-arounds. This needs either to be a somewhat specific solution or a well designed and integrated soln.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to review if you revert the unrelated changes.

@@ -323,7 +323,7 @@ def test_repeat(self, data, repeats, as_series, use_numpy):
self.assert_equal(result, expected)

@pytest.mark.parametrize('repeats, kwargs, error, msg', [
(2, dict(axis=1), ValueError, "'axis"),
(2, dict(axis=1), ValueError, "'?axis"),
Copy link
Contributor

@TomAugspurger TomAugspurger Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test seemed to raise with right error message, but without the quote. I'll revert and see if the CI fails again.

self._dtype = DecimalDtype(context)

# aliases for common attribute names, to ensure pandas supports these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the rounding changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue helped me write a bug while working on the rounding changes, and I fixed it to protect myself. does that count as related?

@@ -153,6 +169,40 @@ def _reduce(self, name, skipna=True, **kwargs):
"the {} operation".format(name))
return op(axis=0)

# numpy experimental NEP-18 (opt-in numpy 1.16, enabled in in 1.17)
def __array_function__(self, func, types, args, kwargs):
def coerce_EA(coll):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a good idea. For now, just use np.asarray. I think we have a separate issue about converting EAs to preferred ndarray representation, but __array__ is the best we have for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already said I'm not thrilled with this approach, but I'm not sure what you're suggesting.
How can you support pd.concat([EA,EA]) without resorting to something like this?

__array__ in relation to what? I've already discussed how implementing NEP-18 on a class, makes numpy ignore __array__, since point 4 in he first post.

@ghost
Copy link
Author

ghost commented Jun 16, 2019

@TomAugspurger , @jreback
We decided on NEP-18 in 26730. opting in to NEP-18 disables __array__ which has repercussions on how other ops are implemented. I'm deliberately surfacing those issues for discussion.

If NEP-18 is used, something like this seems necessary. "Just use np.asarray" ignores the issue.

Here are the cases again:

  • Series.sum etc, dispatch to pandas-specific _reduce dispatch function
  • np.floor(EA), support requires __array__ufunc__ be implemented
  • pd.repeat(EA,n) Series(EA).repeat(n) dispatches to numpy and __array__function__ if available array if not. Requires coersion of result back to type.
  • pd.concat([EA, foo],n) dispatches to np.concat and calls __array__function__ if available or __array__ if not. Requires coercion of result back to common type, depending on arguments.
    if using NEP18, requires inspection of arguments in order to find an coerce EAs, and handling mixed-dtype cases.
  • Seried(EA).round() dispatches to np.round (with this patch) and calls __array__function__ if available or __array__. Requires coercion before and after.

standard pandas series know how to do all these things, and EA should be able to do so too, if the EA author needs them to.

We could dispense with NEP18, and implement round() on EA directly. But that means all other functions need to be explicitly implemented too. It also means __array__ will be used, so invoking many ops will return an object result.

@jreback
Copy link
Contributor

jreback commented Jun 17, 2019

@pilkibun in regards to #26817 (comment)

pd.repeat(EA,n)

this is not a valid function

pd.concat([EA, foo],n)

this is not a problem at all as pandas can easily correctly dispatch on this w/o any numpy action at all

Series.sum

this is already handled by _reduce operators

np.floor(EA),

is the only valid case AFAIK to actually use __array_function__

Let's NOT try to solve the world in this PR. Please please focus on a single one of these and don't try to change the others. These are each complicated enough and don't need to be intermixed.

@jorisvandenbossche
Copy link
Member

@pilkibun workflow related comment: can you just push new commits instead of force pushes? That makes it a lot easier to see what changed.

@jorisvandenbossche
Copy link
Member

pd.repeat(EA,n)

this is not a valid function

that's a type, it is about np.round

@jorisvandenbossche
Copy link
Member

@pilkibun can you try to explain how pd.concat ends up calling np.concatenate on the EA? If we do, that seems like a bug, as we should have a specific code path for EAs that knows how to concat them.

@jorisvandenbossche
Copy link
Member

@pilkibun could you add back all the changes related to NEP18 you had? Having it removed from the history of this branch makes it very difficult to discuss the comments that @jreback and @TomAugspurger had.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

Reset as best I could. sorry for the pain. No more.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

Let's NOT try to solve the world in this PR. Please please focus on a single one of these and
don't try to change the others. These are each complicated enough and don't need to be intermixed.

@jreback, That sounds like you're being really pragmatic and down to earth, while I'm wasting time playing with nonsense. I don't think that's fair. If you actually read #26817 (comment) (please do), you'd see I opened by saying

opting in to NEP-18 disables array which has repercussions on how other ops are implemented.
I'm deliberately surfacing those issues for discussion.

So, suggesting the solution is to "just solve round()" is disingeneous. The real discussion is about how to solve the larger issue, with np.round having beem the original use case, and several more having surfaced in the meantime, which I listed for reference.

So Please don't oversimplify complicated things just for a rejoinder which doesn't help the discussion.

@jreback
Copy link
Contributor

jreback commented Jun 17, 2019

Let's NOT try to solve the world in this PR. Please please focus on a single one of these and
don't try to change the others. These are each complicated enough and don't need to be intermixed.

@jreback, That sounds like you're being really pragmatic and down to earth, while I'm wasting time playing with nonsense. I don't think that's fair. If you actually read #26817 (comment) (please do), you'd see I opened by saying

opting in to NEP-18 disables array which has repercussions on how other ops are implemented.
I'm deliberately surfacing those issues for discussion.

So, suggesting the solution is to "just solve round()" is disingeneous. The real discussion is about how to solve the larger issue, with np.round having beem the original use case, and several more having surfaced in the meantime, which I listed for reference.

So Please don't oversimplify complicated things just for a rejoinder which doesn't help the discussion.

@pilkibun I never said any of that, please do NOT put words into my statements. I AM being practical. This will not be merged as is. You need to separate things into well constructed and documented components.

We are certainly willing to take tactical patches while awaiting towards a strategic fix. However, you are mixing up too many different items here. It is best to stick to a single well formed patch.

@jreback
Copy link
Contributor

jreback commented Jun 17, 2019

So Please don't oversimplify complicated things just for a rejoinder which doesn't help the discussion.

@pilkibun I can't even begin to have a discussion about this code change. You have SO much going on that its not even worth discussing in this state except to say: simplify what you are changing.

Please don't waste my time. I review many many PRs. The need to be focused and well formed. This is is fine if its 'pie-in-the' sky.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

Please don't waste my time.

Gladly. Here and in other discussion I've found you to be rude, arrogant, dismissive, condescending,
and impervious to reasonable argument when you make a mistake. To be blunt, I don't enjoy collaborating with you. @jorisvandenbossche and @TomAugspurger otoh are helpful, patient and constructive, wven and especially when new contributors stumble around a little. You seem to be out to win any and all arguments, and to always have the last word on everything.

I have had many missteps, but then again its a large project, the code is new to me, and I'm working on something which hasn't been fully fleshed out. Joris and tom make me want to do better when I fumble, while you make me want to walk away.

I think the solution is simple. Save yourself precious time, and me the aggravation, and leave us to work in peace. I'm sure the two of them are more than capable of guiding this towards an acceptable result, or rejecting it if it isn't good enough.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

I'd like to get back to discussing implementing ops for EA, if possible, instead of you dragging me into an argument which will drain everyone of good will.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

Back to the issue

@jorisvandenbossche, I can't replicate the pd.concat issue. I must have encountered it during some changes. It does seem to work even without NEP18. I guess I had trouble with pd.concat([EA,EA])
which is not really an issue, since we actually work with the series not the underlying array

In [11]: pd.concat([s.array,s.array])
TypeError: cannot concatenate object of type '<class 'array2.DecimalArray'>'; only Series and DataFrame objs are valid

@TomAugspurger
Copy link
Contributor

Let's put this conversation on hold for a bit, to give everyone some time away from it. I'll revisit it in 24h.

@ghost
Copy link
Author

ghost commented Jun 17, 2019

I opened #26900 and #26901 a couple of hours ago, to eliminate some of the noise here, but then @jorisvandenbossche , you were unhappy about losing the discussion commits. How do you want me to handle that?

@TomAugspurger
Copy link
Contributor

Welcome back, I hope everyone had a productive break. I'm going to try to
summarize the meta-issue of where the discussion about the issue went wrong.
Once we all feel OK with that, we can return to the real discussion about the
PR. Sound good?


I think the root cause of the conflict were different expectations between the
contributor and reviewers.

Typically, pandas uses the issue for design, and the PR for implementation. This
PR, which is exploring a new exciting space, was doing both. (breaking my rule
and talking about the issue instead of the meta-issue: this is a fundamentally
hard problem. I think it's worth recognizing that.)

So, let's attempt a summary at what happened:

The precipitating comment seems to be
#26817 (comment) by Jeff.

unlikely to merge this with all of these work-arounds. This needs either to be
a somewhat specific solution or a well designed and integrated soln.

From there, things escalated.

In #26817 (comment), Jeff
requested keeping things focused. Reading the comment again, things seem OK, if
terse. But that terseness can be problematic from the contributor's point of
view. It gives the impression that the maintainer is dismissing the work done
by the contributor without fully understanding the issue.

In #26817 (comment), by
@pilkibun, I think we see the fundamental difference in expectations.

The real discussion is about how to solve the larger issue, with np.round
having beem the original use case, and several more having surfaced in the
meantime, which I listed for reference.

There were also a couple slightly incendiary (can something be slightly
incendiary?) phrasings like the "(please do)" aside. Certainly not outrageous,
just not constructive. Speaking personally, those kinds of comments demotivate
me. Over the course of a day reviewing many PRs those kinds of comments add up.
Others may respond differently, but I think we'd all be better off if the
semi-passive-aggressive comments were deleted before hitting the send button.

In #26817 (comment), I
think @jreback made a mistake. "Please don't waste my time. I review many many
PRs." The base request of simplifying the PR had already been made. I don't
think that comments like that help build the kind of community we want for
pandas. If there are ever PRs, contributors, or individuals you don't want to
deal with Jeff, then I think the best thing is to have another maintainer step
in. Possibly making the request through a side-channel.

And things went further downhill from there as patience frayed all around. But I
don't think going through additional comments one by one will be productive.


Summary over; now where do we stand on the meta-issue? A couple of general
observations:

First, online communication is difficult. We have a diverse set of backgrounds
coming together to build pandas, so there will be differences in communication
style. Since we're online it's easy to misinterpret others' thoughts and
intentions, especially when time is short. As a general rule, it's good to
assume good intentions from others. As a more specific rule, it's helpful to
spend a bit of time now clarifying your own understanding, rather than leaving
it up to interpretation.

Second, expectation setting seems crucial. As trivial as it sounds, I think much
of the conflict would have been avoided if the URL was /pandas/issues/X rather
than /pandas/pull/X. Then it's clear that we're in the exploratory / design
phase, rather than the "ready to be merged" phase (though the line on when to
move a discussion from the design issue to PR is very blurry). We should update
https://github.com/pandas-dev/pandas/blob/master/.github/PULL_REQUEST_TEMPLATE.md
to emphasize that the design should be mostly settled in an issue.

Related to both, I think Brett Cannon's thoughts on this are worth reading:
https://snarky.ca/setting-expectations-for-open-source-participation/

And as a concrete example of how things could have gone differently, the comment
from Jeff

This needs either to be a somewhat specific solution or a well designed and
integrated soln.

could be rephrased as

I didn't fully read all previous discussion, but looking at the diff, this
seems very complex to me with a lot of work-around, and not mergeable in
the current state. Can you explain why you think those are needed in this
PR
? Can we instead make focused, incremental steps to our final goal?

I think that better communicates what I think was the intended message. But the
original comment can be read as dismissive, which I don't think we want in our
community.

@jreback @pilkibun do you have thoughts? Perhaps reactions to the general
observations would be a good place to start?

Reminder: let's have a meeting of minds on the meta-issue before returning to
the actual PR itself.

@ghost ghost closed this Jul 2, 2019
@ghost ghost deleted the series_round_use_self_array branch July 31, 2019 16:19
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.round behavior with ExtensionArrays
4 participants