-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Selection on MultiIndex: The need for empty slice support & dict indexing #4036
Comments
This is somewhat related to Issue #3057. Being able to select on arbitrary combinations of index levels and return arbitrary index levels. |
I did look through the new code and the docs so apologies if I missed this, but it would still be nice to be able to select by index names, as in @dragoljub 's suggestion Obviously this is a minor convenience compared to what you just merged, so thank you for that. But maybe this could be left open? Or should a separate issue be opened? |
@hsharrison ahh...did even see that, so you want to pass a dict (rather than a nested tuple) bascically? (with keys of the level numbers or names).... hmm...I don't think that would be that difficult.... can you post an example of using the partial slicing syntax (what I just merged) and your proposed |
And reference by name using a dict:
Another possibility is to use keyword arguments in a function instead of a dict in a getitem. Not sure if (well, maybe not. The conciseness is nice but it can't refer to non-named levels) |
+1 on allowing empty slicing. |
Hey cool to see this being revived! 👍 I also like how the curly brace dict notation looks. 😄 Althought the dict(A=1, C=2) seems more natural. Its interesting the parsing the dict constructor does to infer the string column name. However you will not be able to specify the index level with In [15]: df.loc[{'A':1, 'C':2}, :]
Out[15]:
n_foos n_bars
A B C
1 0 2 14 21
1 2 63 2
2 2 99 22
[3 rows x 2 columns] |
In xray (pandas for N-dim data) we have somewhat similar support for doing indexing with named axes (rather than levels) via the keyword arguments in a function pattern, e.g., |
so we can revisit this someone want to post a short (but complete) usecase / example and proposed syntax (in a new issue)? |
👍 on this feature, would find it very useful |
So is it somehow possible to slice Series on part of multindex without loosing levels that I need for further computation? I have tried slice(None) and IndexSlice... The only thing that really helped was to box Series to DataFrame, somehow the same slicing in dataframe doesn't drop levels... Why do we have such inconsistency between Series and DataFrames? |
@aurelije can you provide a simple reproducible example of such inconsistency? Are you sure you did pass indexers for both axes when indexing the DataFrame? |
@toobaz famous Operation Research example:
Now dropping one of the level:
Same thing in Dataframe doesn't drop the level:
|
I think this is related to #12827 (comment) more than to this issue. Notice that the behavior you look for (not dropping the level) is what we (pandas) would like to get rid of, but the same result can be easily obtained by wrapping the label in a list: In [13]: unit_cost_from_plant_to_market.loc[['seattle'], slice(None)]
Out[13]:
plant market
seattle new-york 2.5
chicago 1.7
topeka 1.8
Name: unit_cost, dtype: float64 |
Thanks a lot @toobaz, I was not aware of that functionality |
Looks like this feature request hasn't has much engagement over the years so closing. If there's renewed interest it would be best to have it in a a new issue |
related #4036, #4116
from SO with 0.14.0 multi-index slicers: http://stackoverflow.com/questions/24126542/pandas-multi-index-slices-for-level-names/24126676#24126676
Here's the example from there:
Here's a complete example
This is your dict of level names -> slices
This creates an indexer that is empty (selects everything)
Add in your slicers
And select (this has to be a tuple, but was a list to start as we had to modify it)
I use hierarchical indices regularly with pandas DataFrames and Series objects. It is invaluable to be able to partially select subsets of rows based on a set of arbitrary index values, and retain the index information for subsequent groupby operations etc.
I am looking for an elegant way to pass an ordered tuple (with possibly empty slices) or an arbitrary dict of {index_level_name:value,...} pairs to select rows matching the passed index:value pairs. Note: I am aware that with Boolean indexing on data columns and nested np.logical_and() statements you can construct such a Boolean select index. I'm looking for an elegant solution using indexes & levels to avoid repeatedly using df.reset_index and building Boolean arrays. Also, df.xs() does not work in every situation (see below) and does not exist for Series with MultiIndex.
To explain this lets create a DataFrame with 5 index levels:
Now index on every level and we get back the rows we want :) I love that I get back the complete index too because it may be useful later.
Now if we index on the first 4 levels we get back something different, a data frame with the first 4 index levels dropped. It would be nice to have the option to keep all index levels even though they are repetitive (like above).
Now comes the tricky part. What if I only want to index on the first and last 2 index levels, and want everything from the 3rd level? Empty slicing is not supported.
df.xs can somewhat help here but its useless for MultiIndex on a series. And it drops the indexed levels leaving you unsure to what fixed index levels you have drilled to. :(
Interestingly df.xs() is not consistant, because you cannot explicitly index on every level giving it the list of all level names:
However df.xs without the level attribute on all index levels works as expected...
Thoughts:
One (somewhat limiting) solution could be allowing df.ix[(0,1,3,:,4)] to take an empty slice for an index level and return the data frame indexed on only the the passed index levels that are known. Today this capability does not exist, although an ordered partial list of index levels works.
The next and more general approach could be to pass a dict of df.ix[{index_level:value}] pairs and return the rows where the specified index levels equal the passed values. Unspecified levels are not filtered down and we have the option to return all index levels.
The text was updated successfully, but these errors were encountered: