-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574
Comments
it's not whether we should do this it's just ambiguous and I don't think it's possible |
@jreback I'm not sure I understand you're comment. What I was saying is I don't think it is possible to do the incomplete indexing in the general case in any reasonable way which is why I was advocating deprecating incomplete indexing for multiIndexed dataframes. |
@tgarc my point is that the following are both completely legitimate, but mean different things. Their is no way to disambiguate what is meant except for the user providing context (e.g. both axes). So how do you propose to deprecate this then?
|
First, here's the example DataFrame
For
DataFrame
s withMultiIndex
ed rows, pandas allows this type of indexingdf.loc[('foo','bar'), ('one','two'), ('three','four')]
To be taken to mean
df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]
But this type of indexing is ambiguous in the case when the number of indexing tuples is 2 since
df.loc[('foo','bar'), ('one','two')]
could mean incomplete indexing as in
df.loc[(('foo','bar'), ('one','two')),:]
or row,column indexing as in
df.loc[(('foo','bar'),), (('one','two'),)]
I appreciate that there is already a warning for this in the documentation, but I wonder if the functionality is worth the complications it adds to the code/docs.
Personally, I would suggest offloading the responsibility of complete indexing on a MultiIndex DataFrame to the user (obviously this doesn't apply to
Series
as they are 1d so to speak). This would take away the minor syntactical convenience of not specifying the column index, but it simplifies the code and gives the user only one way to index on a MultiIndex DataFrame (which makes usage less confusing).The consequence to the user in the specific case of selecting multiple levels of a row-MultiIndex on a DataFrame is that instead of writing
df.loc['foo','one']
they would have to write
df.loc[('foo','one'), :]
And, in the syntactically worst case, instead of writing
df.loc[('foo','bar'), ('one','two'), ('three','four')]
they would have to write
df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]
I'm fairly new to pandas (don't think I started using it until v0.16), so I realize I may be missing the bigger picture. If so, enlighten me!
The text was updated successfully, but these errors were encountered: