-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,11 +38,21 @@ class ExtensionArray(object): | |
* copy | ||
* _concat_same_type | ||
|
||
Some additional methods are available to satisfy pandas' internal, private | ||
block API: | ||
An additional method and attribute is available to satisfy pandas' | ||
internal, private block API. | ||
|
||
* _can_hold_na | ||
* _formatting_values | ||
* _can_hold_na | ||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python | ||
objects with ``self.astype(object)``, which may be expensive. When | ||
performance is a concern, we highly recommend overriding the following | ||
methods: | ||
|
||
* fillna | ||
* unique | ||
* factorize / _values_for_factorize | ||
* argsort / _values_for_argsort | ||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python | ||
objects with ``self.astype(object)``, which may be expensive. When | ||
|
@@ -393,7 +403,8 @@ def _values_for_factorize(self): | |
Returns | ||
------- | ||
values : ndarray | ||
An array suitable for factoraization. This should maintain order | ||
|
||
An array suitable for factorization. This should maintain order | ||
and be a supported dtype (Float64, Int64, UInt64, String, Object). | ||
By default, the extension array is cast to object dtype. | ||
na_value : object | ||
|
@@ -416,7 +427,7 @@ def factorize(self, na_sentinel=-1): | |
Returns | ||
------- | ||
labels : ndarray | ||
An interger NumPy array that's an indexer into the original | ||
An integer NumPy array that's an indexer into the original | ||
ExtensionArray. | ||
uniques : ExtensionArray | ||
An ExtensionArray containing the unique values of `self`. | ||
|
@@ -560,16 +571,13 @@ def _concat_same_type(cls, to_concat): | |
""" | ||
raise AbstractMethodError(cls) | ||
|
||
@property | ||
def _can_hold_na(self): | ||
# type: () -> bool | ||
"""Whether your array can hold missing values. True by default. | ||
_can_hold_na = True | ||
"""Whether your array can hold missing values. True by default. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what our recommended style is here. You can probably just change it to a comment. |
||
|
||
Notes | ||
----- | ||
Setting this to false will optimize some operations like fillna. | ||
""" | ||
return True | ||
Notes | ||
----- | ||
Setting this to False will optimize some operations like fillna. | ||
""" | ||
|
||
@property | ||
def _ndarray_values(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,8 +82,9 @@ def test_getitem_scalar(self, data): | |
assert isinstance(result, data.dtype.type) | ||
|
||
def test_getitem_scalar_na(self, data_missing, na_cmp, na_value): | ||
result = data_missing[0] | ||
assert na_cmp(result, na_value) | ||
if data_missing._can_hold_na: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All these changes are a bit unfortunate... Could we instead have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could make that change to avoid any tests where |
||
result = data_missing[0] | ||
assert na_cmp(result, na_value) | ||
|
||
def test_getitem_mask(self, data): | ||
# Empty mask, raw array | ||
|
@@ -134,8 +135,9 @@ def test_take(self, data, na_value, na_cmp): | |
|
||
def test_take_empty(self, data, na_value, na_cmp): | ||
empty = data[:0] | ||
result = empty.take([-1]) | ||
na_cmp(result[0], na_value) | ||
if data._can_hold_na: | ||
result = empty.take([-1]) | ||
na_cmp(result[0], na_value) | ||
|
||
with tm.assert_raises_regex(IndexError, "cannot do a non-empty take"): | ||
empty.take([0, 1]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge snafu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I reordered
_formatting_values
and_can_hold_na
, since one is a method and the other is an attribute.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this section specifically. This block is repeated starting on line 57. Or perhaps I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really weird. The copy on my machine is clean and fine, but the copy on GitHub has the repeat. Maybe because I resolved conflicts with master using GitHub interface. UGH.