Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: get_loc for ExtensionEngine not returning bool indexer for na #48411

Merged
merged 4 commits into from
Sep 7, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1061,7 +1061,7 @@ cdef class ExtensionEngine(SharedEngine):

cdef ndarray _get_bool_indexer(self, val):
if checknull(val):
return self.values.isna().view("uint8")
return self.values.isna()

try:
return self.values == val
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/indexes/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from pandas.errors import InvalidIndexError

from pandas import (
NA,
DatetimeIndex,
Index,
IntervalIndex,
Expand Down Expand Up @@ -221,6 +222,13 @@ def test_get_loc_generator(self, index):
# MultiIndex specifically checks for generator; others for scalar
index.get_loc(x for x in range(5))

def test_get_loc_masked_duplicated_na(self):
# GH#48411
idx = Index([1, 2, NA, NA], dtype="Int64")
result = idx.get_loc(NA)
expected = np.array([False, False, True, True])
Copy link
Member

@jorisvandenbossche jorisvandenbossche Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly correct (since it will give the same result after indexing), but should this actually return a slice?
(for non-NAs, that seems to be the case)

(for a potential follow-up)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh correct, but only if no NAs are present at all. Have to check in a follow up what we want to do here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is correct. We are only returning slices when the Index is monotonic increasing.

idx2 = Index([1, 2, 2, 3, 3, 0], dtype="int64")
idx2.get_loc(2)

This returns a bool indexer too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, it are the missing values that is making it not monotonic, I suppose.
I tested it with replacing the NAs with a number, in which case it was monotonic:

In [8]: idx = Index([1, 2, 3, 3], dtype="Int64")

In [9]: idx.get_loc(3)
Out[9]: slice(2, 4, None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep correct. Adding NA makes it non-monotonic in all cases.

tm.assert_numpy_array_equal(result, expected)


class TestGetIndexer:
def test_get_indexer_base(self, index):
Expand Down Expand Up @@ -253,6 +261,13 @@ def test_get_indexer_consistency(self, index):
assert isinstance(indexer, np.ndarray)
assert indexer.dtype == np.intp

def test_get_indexer_masked_duplicated_na(self):
# GH#48411
idx = Index([1, 2, NA, NA], dtype="Int64")
result = idx.get_indexer_for(Index([1, NA], dtype="Int64"))
expected = np.array([0, 2, 3], dtype="int64")
tm.assert_numpy_array_equal(result, expected)


class TestConvertSliceIndexer:
def test_convert_almost_null_slice(self, index):
Expand Down