Request for some kind of named arguments loc #11373

szaiser · 2015-10-19T17:59:46Z

Please quickly recall how awesome named arguments for Python are and then read through the following idea:

In case of identical or almost identical values of index and columns, it can be annoying to remember the assignment of the two data-axes to index and columns respectively. This could be solved by assigning names to index and columns, e.g. index_name=yand columns_name=x, and hence being able to retrieve values via pd.DataFrame.name_loc(x=10, y=20) (syntax not final).

I have no idea, if such a feature can be implemented with reasonable effort. I do however believe, that it would highly increase the usability of DataFrames for certain usecases.

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-19T18:06:33Z

we've had discussions #4036 and #11267

Here is an example which is pretty straightforward to do:

mi = pd.MultiIndex.from_product([list('ab'),list('AB')],names=['foo','bar'])

In [5]: s = Series(range(4),index=mi)

In [6]: s
Out[6]: 
foo  bar
a    A      0
     B      1
b    A      2
     B      3
dtype: int64

xray calls this .sel, see here

Here's a quick and dirty impl

In [8]: MultiIndex.__call__ = lambda self, **kwargs: tuple([ (kwargs.get(n,slice(None))) for n in self.names ])

In [9]: s.loc[s.index(foo='b',bar='B')]
Out[9]: 3

In [10]: s.loc[s.index(foo='b')]
Out[10]: 
foo  bar
b    A      2
     B      3
dtype: int64

In [11]: s.loc[s.index(bar='B')]
Out[11]: 
foo
a    1
b    3
dtype: int64

In [12]: s.loc[s.index(bar=['B'])]
Out[12]: 
foo  bar
a    B      1
b    B      3
dtype: int64

szaiser · 2015-10-19T18:32:40Z

Thank you for the quick reply. The MultiIndex solution you present, is a nice and quick workaround. However it has its flaws, which is why I want to thank you even more for pointing me towards xray. I might use that, however .sel - like functionality would be really great in Pandas, which is much more mature than xray.

jreback · 2015-10-20T17:46:30Z

@BigSkylie the issues with actually using a function, and the reason am not a big fan of .sel (though of @shoyer !)

is that this is a function call, and therefor cannot be used as an lvalue. So the second we add:

value = df.sel(....)

people will want

df.sel() = value
which is not allowed in python.

The alternatives are allow dictionaries in df.loc

eg.

df.loc[('x' = 10, 'y' = 5}]

and/or the syntax I described above

df.loc[df.index(x=....), df.columns(y=...)]

having pandas 'figure out' which axis you mean via the index name is laudable and would be sovled by dict access

szaiser · 2015-10-20T18:21:41Z

@jreback Only some minutes after my previous post, I joined the group of disappointed people who want

df.sel() = value

I now must completely agree with you, .sel()is not the holy grail I first thought it to be.

The syntax df.loc[dict(x = 10, y = 5)] (I assume that's what you meant) is as close to my request as possible. At least in Python. Is this planned for the future?

jreback · 2015-10-20T18:31:39Z

@BigSkylie yes It think accepting a dict is on the roadmap, interested in implementing? (not too hard actually), as it is just a pre-processing transformation step

shoyer · 2015-10-20T18:37:33Z

Well, one thing we'll need to figure out here is whether dictionary indexing refers to levels of the multi-index or column/index names. Does df.loc[dict(x=10, y=5)] refer to levels x and y of the multi-index along the rows, or column x and row y? Handling both cases could get messy.

szaiser · 2015-10-20T22:32:15Z

@jreback As soon as I see a Pandas conform solution, I at least will give it a try. Is making changes to _LocationIndexer and _LocIndexer in Pandas.core.Indexing the way to go?
@shoyer The issue you mention seems very serious to me. The only solution I see right now is handing over a 2-tuple containing dictionaries in case of a DataFrame with a MultiIndex. Not a very clean solution, one might say.

jreback · 2015-10-20T22:58:10Z

what is a 'Pandas conform solution'?

follow how .loc works and and you should be able to intercept it near the top-level (yes in _LocIndexer or its super-class)

part of the challenge is ironing out the API to make it clean semantically. So need some tests cases for as much as possible.

szaiser · 2015-10-21T00:42:56Z

@jreback 'Pandas conform' was meant to be related to Pandas similarly to how 'Pythonic' is related to Python. A clean solution, which keeps the user experience coherent all over Pandas.

max-sixty · 2015-10-30T01:37:44Z

+1, FWIW

LindyBalboa · 2017-08-12T10:22:09Z

+1 on interest for such a feature
I believe issue #4036 is also related to this.

The index/column question came up. Why not just handle it in the standard .loc way?

df.loc[{index1:'A', index2: slice(None), index3: [1,3,5], index4: range(1,10,2)}, :]

I tried taking a look at the code to see how .loc[] actually works, but it is above my head. If anyone would care to give me a roadmap about the inner workings, I would be more than willing to take another look.

Here is my current solution, by way of chance. It is a bit hacky, but it works perfectly for my needs. My experimental data has files with titles of the form "Material_seriesX_measurementY_parmZ". I import a single column from each file. Basically all of the index information is stored in the one column name. I then use an expression like

df.filter(regex="(Matx|Maty)_series{2,5}_measurement.*_ParmZ", axis=1)

MPvHarmelen · 2020-08-20T13:24:53Z

I wrote a wrapper around loc that implements this and exports it as loc_by_level_name, but I guess it would be best to just add it to the functionality of loc.

https://gist.github.com/MPvHarmelen/3a3db1b83c0ac82eb41eacbe0cee2d0c

As of now it does getting and setting on any dimension, but not deleting (as it didn't seem like loc supported complex selection for deletion either):

>>> df = pd.DataFrame(
...     [(x, -x) for x in range(1, 5)],
...     index=pd.MultiIndex.from_product(
...            [['a', 'b'], [1, 2]],
...            names=['letter', 'number'],
...     )
... )
>>> df
               0  1
letter number
a      1       1 -1
       2       2 -2
b      1       3 -3
       2       4 -4
>>> df.loc_by_level_name[{'number': 2}]
               0  1
letter number
a      2       2 -2
b      2       4 -4
>>> df.loc_by_level_name(axis=0)[{'number': 1}]
               0  1
letter number      
a      1       1 -1
b      1       3 -3
>>> df.loc_by_level_name[{'number': 1}, 1] = 12
>>> df
               0   1
letter number       
a      1       1  12
       2       2  -2
b      1       3  12
       2       4  -4
>>> fd = df.transpose()
>>> fd
letter   a      b   
number   1  2   1  2
0        1  2   3  4
1       12 -2  12 -4
>>> fd.loc_by_level_name[:, {'letter': 'a'}]
letter   a   
number   1  2
0        1  2
1       12 -2
>>> fd.loc_by_level_name(axis=1)[{'letter': 'b'}]
letter   b   
number   1  2
0        3  4
1       12 -4

mroeschke · 2024-09-12T21:36:58Z

Thanks for the request, but it appears this feature request hasn't been picked up by the community or core team in years so closing for now

jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex Needs Discussion Requires discussion from core team before further action labels Oct 19, 2015

max-sixty mentioned this issue May 20, 2016

Idea: use df.index/df.columns names to automatically choose axis along which to broadcast #13243

Closed

makmanalp mentioned this issue May 24, 2016

Allowing the index to be referenced by name, like a column #8162

Closed

3 tasks

mroeschke added Enhancement and removed API Design labels Apr 20, 2021

mroeschke closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for some kind of named arguments loc #11373

Request for some kind of named arguments loc #11373

szaiser commented Oct 19, 2015

jreback commented Oct 19, 2015

szaiser commented Oct 19, 2015

jreback commented Oct 20, 2015

szaiser commented Oct 20, 2015

jreback commented Oct 20, 2015

shoyer commented Oct 20, 2015

szaiser commented Oct 20, 2015

jreback commented Oct 20, 2015

szaiser commented Oct 21, 2015

max-sixty commented Oct 30, 2015

LindyBalboa commented Aug 12, 2017 •

edited

Loading

MPvHarmelen commented Aug 20, 2020 •

edited

Loading

mroeschke commented Sep 12, 2024

Request for some kind of named arguments loc #11373

Request for some kind of named arguments loc #11373

Comments

szaiser commented Oct 19, 2015

jreback commented Oct 19, 2015

szaiser commented Oct 19, 2015

jreback commented Oct 20, 2015

szaiser commented Oct 20, 2015

jreback commented Oct 20, 2015

shoyer commented Oct 20, 2015

szaiser commented Oct 20, 2015

jreback commented Oct 20, 2015

szaiser commented Oct 21, 2015

max-sixty commented Oct 30, 2015

LindyBalboa commented Aug 12, 2017 • edited Loading

MPvHarmelen commented Aug 20, 2020 • edited Loading

mroeschke commented Sep 12, 2024

LindyBalboa commented Aug 12, 2017 •

edited

Loading

MPvHarmelen commented Aug 20, 2020 •

edited

Loading