Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

tgarc · 2015-07-15T01:33:44Z

First, here's the example DataFrame

                    0  1  2  3  4  5  6  7
first second third                        
bar   one    three  4  9  7  8  5  0  7  8
             four   6  8  1  5  9  9  1  7
      two    three  7  5  2  6  7  8  5  9
             four   0  8  8  5  3  5  8  3
baz   one    three  6  0  8  0  0  9  8  8
             four   9  3  2  0  2  7  4  9
      two    three  0  3  7  7  4  3  7  0
             four   6  3  2  8  3  9  7  8
foo   one    three  6  7  3  7  3  0  3  6
             four   5  8  0  8  1  5  1  5
      two    three  2  0  8  2  8  1  8  3
             four   9  0  2  7  0  6  8  3
qux   one    three  1  2  5  5  0  7  0  1
             four   6  6  7  0  0  4  5  3
      two    three  1  8  2  8  7  5  7  5
             four   1  1  3  8  8  6  0  3

For DataFrames with MultiIndexed rows, pandas allows this type of indexing

df.loc[('foo','bar'), ('one','two'), ('three','four')]

To be taken to mean

df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]

But this type of indexing is ambiguous in the case when the number of indexing tuples is 2 since

df.loc[('foo','bar'), ('one','two')]

could mean incomplete indexing as in

df.loc[(('foo','bar'), ('one','two')),:]

or row,column indexing as in

df.loc[(('foo','bar'),), (('one','two'),)]

I appreciate that there is already a warning for this in the documentation, but I wonder if the functionality is worth the complications it adds to the code/docs.

Personally, I would suggest offloading the responsibility of complete indexing on a MultiIndex DataFrame to the user (obviously this doesn't apply to Series as they are 1d so to speak). This would take away the minor syntactical convenience of not specifying the column index, but it simplifies the code and gives the user only one way to index on a MultiIndex DataFrame (which makes usage less confusing).

The consequence to the user in the specific case of selecting multiple levels of a row-MultiIndex on a DataFrame is that instead of writing

df.loc['foo','one']

they would have to write

df.loc[('foo','one'), :]

And, in the syntactically worst case, instead of writing

df.loc[('foo','bar'), ('one','two'), ('three','four')]

they would have to write

df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]

I'm fairly new to pandas (don't think I started using it until v0.16), so I realize I may be missing the bigger picture. If so, enlighten me!

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-15T01:35:51Z

it's not whether we should do this

it's just ambiguous and I don't think it's possible
in the general case
however if you would like to try to fix be my guest

tgarc · 2015-07-15T01:46:59Z

@jreback I'm not sure I understand you're comment. What I was saying is I don't think it is possible to do the incomplete indexing in the general case in any reasonable way which is why I was advocating deprecating incomplete indexing for multiIndexed dataframes.

jreback · 2015-07-15T13:24:46Z

@tgarc my point is that the following are both completely legitimate, but mean different things. Their is no way to disambiguate what is meant except for the user providing context (e.g. both axes). So how do you propose to deprecate this then?

In [35]: df = DataFrame(np.arange(12).reshape(4,3),columns=[0,2,1],index=MultiIndex.from_product([range(2),range(2)],names=['first','second'])).sortlevel()

In [36]: df
Out[36]: 
              0   2   1
first second           
0     0       0   1   2
      1       3   4   5
1     0       6   7   8
      1       9  10  11

In [37]: df.loc[(0,[1]),:]
Out[37]: 
              0  2  1
first second         
0     1       3  4  5

In [38]: df.loc[(0,[1])]
Out[38]: 
        1
second   
0       2
1       5

toobaz · 2018-05-18T07:49:34Z

Closing since there is no obvious recommendation, other issues (#19110 for instance) face the same problem and there doesn't seem to be interest in just disabling incomplete indexing. @tgarc feel free to argument if you still think this is valid

jreback added API Design MultiIndex labels Jul 15, 2015

jreback added this to the Someday milestone Jul 15, 2015

tgarc mentioned this issue Jul 15, 2015

Towards "pandas 1.0" #10000

Closed

toobaz closed this as completed May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

tgarc commented Jul 15, 2015

jreback commented Jul 15, 2015

tgarc commented Jul 15, 2015

jreback commented Jul 15, 2015

toobaz commented May 18, 2018

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

Comments

tgarc commented Jul 15, 2015

jreback commented Jul 15, 2015

tgarc commented Jul 15, 2015

jreback commented Jul 15, 2015

toobaz commented May 18, 2018