Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for some kind of named arguments loc #11373

Closed
szaiser opened this issue Oct 19, 2015 · 13 comments
Closed

Request for some kind of named arguments loc #11373

szaiser opened this issue Oct 19, 2015 · 13 comments
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Discussion Requires discussion from core team before further action

Comments

@szaiser
Copy link

szaiser commented Oct 19, 2015

Please quickly recall how awesome named arguments for Python are and then read through the following idea:

In case of identical or almost identical values of index and columns, it can be annoying to remember the assignment of the two data-axes to index and columns respectively. This could be solved by assigning names to index and columns, e.g. index_name=yand columns_name=x, and hence being able to retrieve values via pd.DataFrame.name_loc(x=10, y=20) (syntax not final).

I have no idea, if such a feature can be implemented with reasonable effort. I do however believe, that it would highly increase the usability of DataFrames for certain usecases.

@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

we've had discussions #4036 and #11267

Here is an example which is pretty straightforward to do:

mi = pd.MultiIndex.from_product([list('ab'),list('AB')],names=['foo','bar'])

In [5]: s = Series(range(4),index=mi)

In [6]: s
Out[6]: 
foo  bar
a    A      0
     B      1
b    A      2
     B      3
dtype: int64

xray calls this .sel, see here

Here's a quick and dirty impl

In [8]: MultiIndex.__call__ = lambda self, **kwargs: tuple([ (kwargs.get(n,slice(None))) for n in self.names ])

In [9]: s.loc[s.index(foo='b',bar='B')]
Out[9]: 3

In [10]: s.loc[s.index(foo='b')]
Out[10]: 
foo  bar
b    A      2
     B      3
dtype: int64

In [11]: s.loc[s.index(bar='B')]
Out[11]: 
foo
a    1
b    3
dtype: int64

In [12]: s.loc[s.index(bar=['B'])]
Out[12]: 
foo  bar
a    B      1
b    B      3
dtype: int64

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex Needs Discussion Requires discussion from core team before further action labels Oct 19, 2015
@szaiser
Copy link
Author

szaiser commented Oct 19, 2015

Thank you for the quick reply. The MultiIndex solution you present, is a nice and quick workaround. However it has its flaws, which is why I want to thank you even more for pointing me towards xray. I might use that, however .sel - like functionality would be really great in Pandas, which is much more mature than xray.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@BigSkylie the issues with actually using a function, and the reason am not a big fan of .sel (though of @shoyer !)

is that this is a function call, and therefor cannot be used as an lvalue. So the second we add:

value = df.sel(....)

people will want

df.sel() = value
which is not allowed in python.

The alternatives are allow dictionaries in df.loc

eg.

df.loc[('x' = 10, 'y' = 5}]

and/or the syntax I described above

df.loc[df.index(x=....), df.columns(y=...)]

having pandas 'figure out' which axis you mean via the index name is laudable and would be sovled by dict access

@szaiser
Copy link
Author

szaiser commented Oct 20, 2015

@jreback Only some minutes after my previous post, I joined the group of disappointed people who want

df.sel() = value

I now must completely agree with you, .sel()is not the holy grail I first thought it to be.

The syntax df.loc[dict(x = 10, y = 5)] (I assume that's what you meant) is as close to my request as possible. At least in Python. Is this planned for the future?

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@BigSkylie yes It think accepting a dict is on the roadmap, interested in implementing? (not too hard actually), as it is just a pre-processing transformation step

@shoyer
Copy link
Member

shoyer commented Oct 20, 2015

Well, one thing we'll need to figure out here is whether dictionary indexing refers to levels of the multi-index or column/index names. Does df.loc[dict(x=10, y=5)] refer to levels x and y of the multi-index along the rows, or column x and row y? Handling both cases could get messy.

@szaiser
Copy link
Author

szaiser commented Oct 20, 2015

@jreback As soon as I see a Pandas conform solution, I at least will give it a try. Is making changes to _LocationIndexer and _LocIndexer in Pandas.core.Indexing the way to go?
@shoyer The issue you mention seems very serious to me. The only solution I see right now is handing over a 2-tuple containing dictionaries in case of a DataFrame with a MultiIndex. Not a very clean solution, one might say.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

what is a 'Pandas conform solution'?

follow how .loc works and and you should be able to intercept it near the top-level (yes in _LocIndexer or its super-class)

part of the challenge is ironing out the API to make it clean semantically. So need some tests cases for as much as possible.

@szaiser
Copy link
Author

szaiser commented Oct 21, 2015

@jreback 'Pandas conform' was meant to be related to Pandas similarly to how 'Pythonic' is related to Python. A clean solution, which keeps the user experience coherent all over Pandas.

@max-sixty
Copy link
Contributor

+1, FWIW

@LindyBalboa
Copy link

LindyBalboa commented Aug 12, 2017

+1 on interest for such a feature
I believe issue #4036 is also related to this.

The index/column question came up. Why not just handle it in the standard .loc way?

df.loc[{index1:'A', index2: slice(None), index3: [1,3,5], index4: range(1,10,2)}, :]

I tried taking a look at the code to see how .loc[] actually works, but it is above my head. If anyone would care to give me a roadmap about the inner workings, I would be more than willing to take another look.


Here is my current solution, by way of chance. It is a bit hacky, but it works perfectly for my needs. My experimental data has files with titles of the form "Material_seriesX_measurementY_parmZ". I import a single column from each file. Basically all of the index information is stored in the one column name. I then use an expression like

df.filter(regex="(Matx|Maty)_series{2,5}_measurement.*_ParmZ", axis=1)

@MPvHarmelen
Copy link

MPvHarmelen commented Aug 20, 2020

I wrote a wrapper around loc that implements this and exports it as loc_by_level_name, but I guess it would be best to just add it to the functionality of loc.

https://gist.github.com/MPvHarmelen/3a3db1b83c0ac82eb41eacbe0cee2d0c

As of now it does getting and setting on any dimension, but not deleting (as it didn't seem like loc supported complex selection for deletion either):

>>> df = pd.DataFrame(
...     [(x, -x) for x in range(1, 5)],
...     index=pd.MultiIndex.from_product(
...            [['a', 'b'], [1, 2]],
...            names=['letter', 'number'],
...     )
... )
>>> df
               0  1
letter number
a      1       1 -1
       2       2 -2
b      1       3 -3
       2       4 -4
>>> df.loc_by_level_name[{'number': 2}]
               0  1
letter number
a      2       2 -2
b      2       4 -4
>>> df.loc_by_level_name(axis=0)[{'number': 1}]
               0  1
letter number      
a      1       1 -1
b      1       3 -3
>>> df.loc_by_level_name[{'number': 1}, 1] = 12
>>> df
               0   1
letter number       
a      1       1  12
       2       2  -2
b      1       3  12
       2       4  -4
>>> fd = df.transpose()
>>> fd
letter   a      b   
number   1  2   1  2
0        1  2   3  4
1       12 -2  12 -4
>>> fd.loc_by_level_name[:, {'letter': 'a'}]
letter   a   
number   1  2
0        1  2
1       12 -2
>>> fd.loc_by_level_name(axis=1)[{'letter': 'b'}]
letter   b   
number   1  2
0        3  4
1       12 -4

@mroeschke
Copy link
Member

Thanks for the request, but it appears this feature request hasn't been picked up by the community or core team in years so closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

7 participants