API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

jorisvandenbossche · 2019-08-09T07:53:50Z

Follow-up on #27775 and #27818.

Short recap of what those issues were about:

Currently, indexing into an Index with a 2D (or multiple D) indexer results in an "invalid" Index with an underlying ndarray:

In [1]: idx = pd.Index([1, 2, 3])  

In [2]: idx2 = idx[:, None] 

In [3]: idx2
Out[3]: Int64Index([1, 2, 3], dtype='int64')

In [4]: idx2.values
Out[4]: 
array([[1],
       [2],
       [3]])

So from the repr it looks like a proper index, but the underlying values of an Index should always be 1D (such an invalid index will also lead to errors once you do operations on them).

Before pandas 0.25.0, the shape attribute of the index "correctly" returned the shape of the underlying values: (3, 1), but in 0.25.0 this was changed to (3,) (only checking the length). This caused a regression matplotlib (#27775), and will be "fixed" in 0.25.1 returning again the 2D shape of the underlying values (#27818). Of course, this is only about the shape attribute, while the root cause is this invalid Index.

I think it is clear that we should not allow such invalid Index object to exist.
I currently know of two ways to end up such situation:

Passing a multidimensional array to the Index constructor (e.g. pd.Index(np.random.randn(5, 5, 5)). I think this is something we can deprecate and raise for later, and there is already an issue for this: BUG: Index constructor should not allow an ndarray with ndim > 2 #27125
Indexing into an Index (e.g. idx[:, None] ) -> this issue

So let's use this issue to discuss what to do for this second way: a 2D indexing operation on a 1D object.

This is relevant for the Index, but we should probably try to have it consistent with Series as well.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2019-08-09T08:15:57Z

Now on the actual issue: I suppose the current behaviour has historical reasons (certainly from when Series/Index where ndarray subclasses): when doing a 2D operation on a 1D object, we returned a 2D array. This provides some array duck abilities (it behaves as an array-like for code that expects a numpy-like array, as matplotlib did).

For example, Series still does this (for plain numpy types):

In [16]: pd.Series([1, 2, 3])[:, None] 
Out[16]: 
array([[1],
       [2],
       [3]])

and the source of Index.__getitem__ actually mentions that for such a case, a plain ndarray should be returned:

pandas/pandas/core/indexes/base.py

Lines 4241 to 4242 in 640d9e1

    
                   If resulting ndim != 1, plain ndarray is returned instead of 
        
                   corresponding `Index` subclass.

but for Index, that does not happen any more (although, under the hood, we still do create a 2D array, but then wrap it again in the 1D Index).

Now, what to do with this longer term. I think there are two obvious solutions:

Consistently return a 2D numpy array (so let Index follow Series (and its own old) behaviour).
- Advantages:
  - This is the least invasive change, and probably does not break anything (or does not even require a change) in downstream libraries (such as the matplotlib case)
  - The duck-array ability can be useful to write array-implementation agnostic code?
- Disadvantages:
  - This is a very implicit way to convert to a numpy array. It's probably better to do this explicit.
  - What with non-numpy dtypes? (eg currently for Series[category] this fails)
Raise an error (after a deprecation period):
- This will eventually break code and will require changes in eg matplotlib (but, we can do this with a proper deprecation period), but we don't get an implicit type change when indexing an object (Series -> array change due to a certain indexing operation is rather surprising)

jorisvandenbossche · 2019-08-09T08:17:50Z

cc @tacaswell @shoyer

TomAugspurger · 2019-08-09T13:38:19Z

+1 for deprecate and raise.

jbrockmendel · 2019-08-09T19:44:12Z

I think there is special handling for this in DatetimelikeArrayMixin getitem.

shoyer · 2019-08-09T20:49:59Z

Yes, this should definitely raise an error in the long term.

jorisvandenbossche · 2019-08-12T12:07:20Z

OK, my preference is also to raise in the future, so let's go that way.

jreback · 2019-08-12T12:15:35Z

+1 to deprecate and raise

jorisvandenbossche · 2020-01-28T15:41:24Z

So in the meantime, the Index behaviour is deprecated (#30588). I tagged the issue with 1.1 to not forget that we should also deprecate the same behaviour for Series.

Closes pandas-dev#27837

jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves API Design Compat pandas objects compatability with Numpy or Python functions Index Related to the Index class or subclasses labels Aug 9, 2019

jorisvandenbossche mentioned this issue Aug 9, 2019

COMPAT: restore shape for 'invalid' Index with nd array #27818

Merged

jbrockmendel mentioned this issue Jan 2, 2020

BUG: validate Index data is 1D + deprecate multi-dim indexing #30588

Merged

6 tasks

TomAugspurger mentioned this issue Jan 28, 2020

Avoid Index DeprecationWarning in Series getitem #31361

Merged

jorisvandenbossche added this to the 1.1 milestone Jan 28, 2020

h-vetinari mentioned this issue Feb 11, 2020

REGR: changed return type for multi-dimensional indexing #31870

Closed

jorisvandenbossche mentioned this issue Mar 14, 2020

PandasArray does not support slicing consistent with other array types #32692

Closed

mroeschke removed the Compat pandas objects compatability with Numpy or Python functions label Apr 10, 2020

mroeschke added the Deprecate Functionality to remove in pandas label May 5, 2020

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 6, 2020

DEPR: Deprecate n-dim indexing for Series

57dfb14

Closes pandas-dev#27837

TomAugspurger mentioned this issue Jul 6, 2020

DEPR: Deprecate n-dim indexing for Series #35141

Merged

jreback closed this as completed in #35141 Jul 6, 2020

cgdeboer mentioned this issue Oct 7, 2020

multi-dimensional indexing is deprecated in pandas. quantopian/empyrical#130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

jorisvandenbossche commented Aug 9, 2019

jorisvandenbossche commented Aug 9, 2019

jorisvandenbossche commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

jbrockmendel commented Aug 9, 2019

shoyer commented Aug 9, 2019

jorisvandenbossche commented Aug 12, 2019

jreback commented Aug 12, 2019

jorisvandenbossche commented Jan 28, 2020

API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

Comments

jorisvandenbossche commented Aug 9, 2019

jorisvandenbossche commented Aug 9, 2019

jorisvandenbossche commented Aug 9, 2019

TomAugspurger commented Aug 9, 2019

jbrockmendel commented Aug 9, 2019

shoyer commented Aug 9, 2019

jorisvandenbossche commented Aug 12, 2019

jreback commented Aug 12, 2019

jorisvandenbossche commented Jan 28, 2020