-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837
Comments
Now on the actual issue: I suppose the current behaviour has historical reasons (certainly from when Series/Index where ndarray subclasses): when doing a 2D operation on a 1D object, we returned a 2D array. This provides some array duck abilities (it behaves as an array-like for code that expects a numpy-like array, as matplotlib did). For example, Series still does this (for plain numpy types):
and the source of pandas/pandas/core/indexes/base.py Lines 4241 to 4242 in 640d9e1
but for Index, that does not happen any more (although, under the hood, we still do create a 2D array, but then wrap it again in the 1D Index). Now, what to do with this longer term. I think there are two obvious solutions:
|
+1 for deprecate and raise. |
I think there is special handling for this in DatetimelikeArrayMixin getitem. |
Yes, this should definitely raise an error in the long term. |
OK, my preference is also to raise in the future, so let's go that way. |
+1 to deprecate and raise |
So in the meantime, the Index behaviour is deprecated (#30588). I tagged the issue with 1.1 to not forget that we should also deprecate the same behaviour for Series. |
Follow-up on #27775 and #27818.
Short recap of what those issues were about:
Currently, indexing into an Index with a 2D (or multiple D) indexer results in an "invalid" Index with an underlying ndarray:
So from the repr it looks like a proper index, but the underlying values of an Index should always be 1D (such an invalid index will also lead to errors once you do operations on them).
Before pandas 0.25.0, the
shape
attribute of the index "correctly" returned the shape of the underlying values:(3, 1)
, but in 0.25.0 this was changed to(3,)
(only checking the length). This caused a regression matplotlib (#27775), and will be "fixed" in 0.25.1 returning again the 2D shape of the underlying values (#27818). Of course, this is only about theshape
attribute, while the root cause is this invalid Index.I think it is clear that we should not allow such invalid Index object to exist.
I currently know of two ways to end up such situation:
pd.Index(np.random.randn(5, 5, 5))
. I think this is something we can deprecate and raise for later, and there is already an issue for this: BUG: Index constructor should not allow an ndarray with ndim > 2 #27125idx[:, None]
) -> this issueSo let's use this issue to discuss what to do for this second way: a 2D indexing operation on a 1D object.
This is relevant for the Index, but we should probably try to have it consistent with Series as well.
The text was updated successfully, but these errors were encountered: