ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801

Dr-Irv · 2018-04-23T19:28:08Z

closes #20659
closes #20761

tests added / passed
- New tests/extensions/relobject tests arrays that cannot hold NaN, and has relational operators defined
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
- Not added as this is part of the ExtensionArray support coming in the next release, which already has a whatsnew entry

NOTE: I expect that the pandas main developers will object to a lot of what is done here! So let's be open on the discussion.

Goals of this pull request:

If there is an ExtensionArray, allow the relational operators to delegate to the base type
Modify ExtensionArray tests to handle the cases of _can_hold_na==False as best as possible.
An eventual goal is that if I have a RelObjectArray that consists of objects that have operators such as __le__ defined that return objects, and a and b are Series containing that type of array, then a <= b returns a Series of objects rather than booleans.

Things I'm unsure of:

In core/ops.py, I am using inspect to determine whether the user has implemented the relational operators for the subclassed ExtensionArray type as well as for the base type of elements in the array. I'm not sure if this is best practice or not.
In my example that I'm separately working on, the relational operators (including __eq__) return objects as opposed to booleans, and some (intentionally) throw exceptions. So I tried to catch these various cases. Not having __lt__ defined means that sorting is undefined, and not having __eq__ return a boolean messes up some things related to groupby. For my application, that is OK.

TomAugspurger · 2018-04-23T20:53:56Z

I am using inspect to determine whether the user has implemented the relational operators for the subclassed ExtensionArray type as well as for the base type of elements in the array. I'm not sure if this is best practice or not.

One issue here is for __eq__ and __neq__. All EAs will implement those by virtue of inheriting from object, even if the author doesn't implement it "correctly". We could have EA authors opt into this by defining a class attributes like _equatable, _orderable, etc.

TomAugspurger

The NA stuff looks good. Small question about the test changes.

Taking a look at the ops stuff in a bit.

TomAugspurger · 2018-04-23T20:55:20Z

pandas/core/arrays/base.py

    * _formatting_values
+    * _can_hold_na
+
+    Some methods require casting the ExtensionArray to an ndarray of Python


Merge snafu?

No, I reordered _formatting_values and _can_hold_na, since one is a method and the other is an attribute.

I meant this section specifically. This block is repeated starting on line 57. Or perhaps I'm missing something.

This is really weird. The copy on my machine is clean and fine, but the copy on GitHub has the repeat. Maybe because I resolved conflicts with master using GitHub interface. UGH.

TomAugspurger · 2018-04-23T20:56:25Z

pandas/core/arrays/base.py

-        # type: () -> bool
-        """Whether your array can hold missing values. True by default.
+    _can_hold_na = True
+    """Whether your array can hold missing values. True by default.


Not sure what our recommended style is here. You can probably just change it to a comment.

TomAugspurger · 2018-04-23T20:59:13Z

pandas/tests/extension/base/getitem.py

@@ -82,8 +82,9 @@ def test_getitem_scalar(self, data):
        assert isinstance(result, data.dtype.type)

    def test_getitem_scalar_na(self, data_missing, na_cmp, na_value):
-        result = data_missing[0]
-        assert na_cmp(result, na_value)
+        if data_missing._can_hold_na:


All these changes are a bit unfortunate...

Could we instead have the data_missing fixture raise pytest.skip when necessary? I suspect this would have to be done by the EA author in their code.

I could make that change to avoid any tests where data_missing is used, and the EA author would have to make that change if they wanted _can_hold_na=False.

TomAugspurger · 2018-04-23T21:01:50Z

pandas/tests/extension/relobject/test_relobject.py

+
+
+@pytest.fixture
+def data_missing_for_sorting():


Ah, I clearly didn't have arrays that don't support NA in mind when I wrote this :)

Could you add this note to the data_missing_for_sorting docstring (in tests/extensions/conftest.py).?

jreback · 2018-04-24T10:20:48Z

this won't be for 0.23, you are basically defining the ops protocol for abstraction, which is needed, but also needs substantial testing in the current EA classes.

TomAugspurger · 2018-04-24T13:06:13Z

this won't be for 0.23

Yeah, I think I agree given the deadline we set (RC was supposed to be yesterday). I could see this taking a bit of time to get right.

@Dr-Irv could you split the changes for _can_hold_na out to a separate PR? That part seems ready to go.l

Dr-Irv · 2018-04-24T15:08:58Z

@TomAugspurger Yes, I can split the changes out for the _can_hold_na part. And I see that the operator stuff I tried to do is part of a bigger discussion. Should I create a new test case for the _can_hold_na part?

TomAugspurger · 2018-04-24T15:15:23Z

Yeah, that'd be nice. It can be a very simple example I think... Maybe make a subclass of `DecimalArray` with `_can_hold_na=False` and see if the tests pass once you apply your changes?

…

On Tue, Apr 24, 2018 at 10:09 AM, Dr. Irv ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> Yes, I can split the changes out for the _can_hold_na part. And I see that the operator stuff I tried to do is part of a bigger discussion. Should I create a new test case for the _can_hold_na part? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20801 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsmL3_cFNRvq4fhiEo-UBQz90xGfks5tr0AWgaJpZM4Tge6g> .

Dr-Irv added 3 commits April 23, 2018 15:10

ENH: ExtensionArray support for objects

42101d8

fix whitespace

f00408e

Merge branch 'master' into issue20659

74df86d

TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 23, 2018

TomAugspurger added this to the 0.23.0 milestone Apr 23, 2018

TomAugspurger reviewed Apr 23, 2018

View reviewed changes

jreback removed this from the 0.23.0 milestone Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801

ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801

Dr-Irv commented Apr 23, 2018 •

edited by jreback

Loading

TomAugspurger commented Apr 23, 2018

TomAugspurger left a comment

TomAugspurger Apr 23, 2018

Dr-Irv Apr 23, 2018

TomAugspurger Apr 23, 2018

Dr-Irv Apr 23, 2018

TomAugspurger Apr 23, 2018

TomAugspurger Apr 23, 2018

Dr-Irv Apr 23, 2018

TomAugspurger Apr 23, 2018

jreback commented Apr 24, 2018

TomAugspurger commented Apr 24, 2018

Dr-Irv commented Apr 24, 2018

TomAugspurger commented Apr 24, 2018 via email

ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801

ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801

Conversation

Dr-Irv commented Apr 23, 2018 • edited by jreback Loading

Goals of this pull request:

Things I'm unsure of:

TomAugspurger commented Apr 23, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 24, 2018

TomAugspurger commented Apr 24, 2018

Dr-Irv commented Apr 24, 2018

TomAugspurger commented Apr 24, 2018 via email

Dr-Irv commented Apr 23, 2018 •

edited by jreback

Loading