Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: consider undeprecating Series.item() ? #29250

Closed
jorisvandenbossche opened this issue Oct 28, 2019 · 9 comments · Fixed by #30175
Closed

API: consider undeprecating Series.item() ? #29250

jorisvandenbossche opened this issue Oct 28, 2019 · 9 comments · Fixed by #30175
Labels
API Design Deprecate Functionality to remove in pandas
Milestone

Comments

@jorisvandenbossche
Copy link
Member

I recently ran into this as well (but forgot to open an issue), and raised now by @alexiswl in #18262 (comment)

Series.item() was a consequence of (historically) inheriting from np.ndarray, and was deprecated (like a set of other ndarray-inhertited methods/attributes) a while ago.

While .item() could also be used to select the "i-th" element (.item(i)), and this use case is certainly redundant (not arguing here to get that aspect back), there is one use case where item() can actually be useful: if you do not pass i, the method returns the element of the Series only if it has one element, otherwise it errors.

Such a situation can typically occur if you use boolean indexing (or query) to select a single element. Eg in cases like s[s == 'val'] or df.loc[df['col1'] == 'val', 'col2'] where you know the condition should yield a single element.
You then typically want the scalar element as result, but those two code snippets give you a Series of one element. In those cases, you could use item() to retrieve it: s[s == 'val'].item().

I saw some people using .item() exactly for this use case, so wondering if it is worth to keep it for this (only the version without passing i).

The logical alternative is doing a .iloc[0], but .item() has the advantage of guaranteeing there was actually only one result item.

@jorisvandenbossche jorisvandenbossche added API Design Deprecate Functionality to remove in pandas labels Oct 28, 2019
@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Oct 28, 2019
@simonjayhawkins
Copy link
Member

could a errors kwarg to Series.squeeze achieve this?

@jorisvandenbossche
Copy link
Member Author

Yeah, I actually used .squeeze() myself for this usecase before (eg in some of my tutorials), but after seeing somebody use .item() for this, I found that more elegant.
Actually the docstring of squeeze gives an example for this usecase: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.squeeze.html

For the actual suggestion: this might get complicated, as in squeeze you can have multiple dimensions. So when would an error be raised? If none of the dimensions equals 1 (and so no dimension can be squeezed), or if not all dimensions are 1 (and the result is not a scalar)? Given that there are possibly multiple ways to interpret it, it might not be the best keyword.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Oct 28, 2019

agreed that an additional argument would need to be consistent with DataFrame.squeeze and yet not too complicated.

rational for this would be to avoid two ways of doing the same operation, with the difference being that one raises.

@SaturnFromTitan
Copy link
Contributor

Intuitively I thought .at could do what you search for. This would also yield a simpler API than chaining .loc and .item/.squeeze.

It seems like .at can't deal with a boolean series index though:

>> df = pd.DataFrame([[1, 2], [3, 4]], cols=list("ab"))
>> df.at[df["a"] == 1, "a"]

yields
ValueError: At based indexing on an integer index can only have integer indexers

I feel like this was possible before, so it was probably deprecated for a good reason that I'm not aware of. But if we're talking about un-deprecation, I think this should be considered as well.

Sorry if I open a closed discussion with that 😄

@TomAugspurger
Copy link
Contributor

I'm fine with un-deprecating this. Is it a blocker for 1.0?

Would we add it to DataFrame, with a similar behavior to a 2D ndarray?

@jorisvandenbossche
Copy link
Member Author

It's certainly not a blocker, but since it's deprecated, and if we want to do this, better sooner than later (and it should be a quick PR).

Would we add it to DataFrame, with a similar behavior to a 2D ndarray?

That could be done yes, although I personally find that less needed (boolean indexing on the columns is much less common I think)

Intuitively I thought .at could do what you search for. This would also yield a simpler API than chaining .loc and .item/.squeeze.

I don't think .at could ever do this in the past. It requires a single label.
But thinking about it, that might actually be a nice alternative. It's indeed shorter, and the end result (a single value out of the dataframe) is still the same as its current purpose.
It does complicate the API of .at though. Now it is very simple: a single label for each axis. That would be expanded with: a boolean mask with a single True value ..

@shoyer
Copy link
Member

shoyer commented Nov 14, 2019

The other use case for NumPy's .item() is to pull out a built-in Python scalar (e.g., float), rather than a NumPy scalar (e.g., float64). I think this could still make sense for pandas.

@jorisvandenbossche
Copy link
Member Author

For that use case, I suppose you would want to keep the "full" behaviour of numpy? (so also s.item(i) , while for the above discussed use case (getting rid of the Series container for len-1 Series), s.item() without argument is enough).

@shoyer
Copy link
Member

shoyer commented Nov 15, 2019 via email

HyukjinKwon pushed a commit to databricks/koalas that referenced this issue Sep 9, 2020
Since `Index.item()` is un-deprecated from pandas 1.0.0 (pandas-dev/pandas#29250), this PR proposes `Index.item()` and `MultiIndex.item()`.

```python
>>> kidx = ks.Index([10])
>>> kidx.item()
10

>>> kmidx = ks.MultiIndex.from_tuples([('a', 'x')])
>>> kmidx.item()
('a', 'x')
```
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this issue Jan 27, 2023
Since `Index.item()` is un-deprecated from pandas 1.0.0 (pandas-dev/pandas#29250), this PR proposes `Index.item()` and `MultiIndex.item()`.

```python
>>> kidx = ks.Index([10])
>>> kidx.item()
10

>>> kmidx = ks.MultiIndex.from_tuples([('a', 'x')])
>>> kmidx.item()
('a', 'x')
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants