Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix Series.get() for ExtensionArray and Categorical #20885

Merged
merged 9 commits into from
May 9, 2018
15 changes: 8 additions & 7 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3081,13 +3081,14 @@ def get_value(self, series, key):
# if we have something that is Index-like, then
# use this, e.g. DatetimeIndex
s = getattr(series, '_values', None)
if isinstance(s, (ExtensionArray, Index)) and is_scalar(key):
try:
return s[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just use .get_loc(key) on an Index as well does this work? (perf should be similar). That way we can avoid separating this.

except (IndexError, ValueError):

# invalid type as an indexer
pass
if is_scalar(key):
if isinstance(s, (Index, ExtensionArray)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be an and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

try:
iloc = self.get_loc(key)
return s[iloc]
except KeyError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to the Indexerror case? is that not possible now?

if isinstance(key, (int, np.integer)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_integer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn’t the pass work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I've just pushed a new commit that uses is_integer and adds comments.

pass doesn't work because we are catching a different exception. So the way I unified the code between Index and ExtensionArray was by handling the exception differently in the case that the key was a scalar.

return s[key]

s = com._values_from_object(series)
k = com._values_from_object(key)
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/extension/base/getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,31 @@ def test_getitem_slice(self, data):
result = data[slice(1)] # scalar
assert isinstance(result, type(data))

def test_get(self, data):
# GH 20882
s = pd.Series(data, index=[2 * i for i in range(len(data))])
assert s.get(4) == s.iloc[2]

result = s.get([4, 6])
expected = s.iloc[[2, 3]]
self.assert_series_equal(result, expected)

result = s.get(slice(2))
expected = s.iloc[[0, 1]]
self.assert_series_equal(result, expected)

s = pd.Series(data[:6], index=list('abcdef'))
assert s.get('c') == s.iloc[2]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test for a slice like s.get(slice(2))? That seems to be valid as far as .get is concerned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your suggestion, I don't think you need all of those changes. (I also don't think they will work because self._engine.get_value() returns an item from the from the passed ndarray and you are using that to index into the series itself). Things work fine in how I did it when specifying a slice or multiple indices. The issue is when the key is a single value, which is what I took care of with my fix.

I'll push some additional tests.

result = s.get(slice('b', 'd'))
expected = s.iloc[[1, 2, 3]]
self.assert_series_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some cases with an out-of-range integer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = s.get('Z')
assert result is None

assert s.get(4) == s.iloc[4]

def test_take_sequence(self, data):
result = pd.Series(data)[[0, 1, 3]]
assert result.iloc[0] == data[0]
Expand Down