API: consider undeprecating Series.item() ? #29250

jorisvandenbossche · 2019-10-28T07:49:59Z

I recently ran into this as well (but forgot to open an issue), and raised now by @alexiswl in #18262 (comment)

Series.item() was a consequence of (historically) inheriting from np.ndarray, and was deprecated (like a set of other ndarray-inhertited methods/attributes) a while ago.

While .item() could also be used to select the "i-th" element (.item(i)), and this use case is certainly redundant (not arguing here to get that aspect back), there is one use case where item() can actually be useful: if you do not pass i, the method returns the element of the Series only if it has one element, otherwise it errors.

Such a situation can typically occur if you use boolean indexing (or query) to select a single element. Eg in cases like s[s == 'val'] or df.loc[df['col1'] == 'val', 'col2'] where you know the condition should yield a single element.
You then typically want the scalar element as result, but those two code snippets give you a Series of one element. In those cases, you could use item() to retrieve it: s[s == 'val'].item().

I saw some people using .item() exactly for this use case, so wondering if it is worth to keep it for this (only the version without passing i).

The logical alternative is doing a .iloc[0], but .item() has the advantage of guaranteeing there was actually only one result item.

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2019-10-28T08:04:51Z

could a errors kwarg to Series.squeeze achieve this?

jorisvandenbossche · 2019-10-28T08:21:22Z

Yeah, I actually used .squeeze() myself for this usecase before (eg in some of my tutorials), but after seeing somebody use .item() for this, I found that more elegant.
Actually the docstring of squeeze gives an example for this usecase: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.squeeze.html

For the actual suggestion: this might get complicated, as in squeeze you can have multiple dimensions. So when would an error be raised? If none of the dimensions equals 1 (and so no dimension can be squeezed), or if not all dimensions are 1 (and the result is not a scalar)? Given that there are possibly multiple ways to interpret it, it might not be the best keyword.

simonjayhawkins · 2019-10-28T08:29:08Z

agreed that an additional argument would need to be consistent with DataFrame.squeeze and yet not too complicated.

rational for this would be to avoid two ways of doing the same operation, with the difference being that one raises.

SaturnFromTitan · 2019-11-01T14:04:48Z

Intuitively I thought .at could do what you search for. This would also yield a simpler API than chaining .loc and .item/.squeeze.

It seems like .at can't deal with a boolean series index though:

>> df = pd.DataFrame([[1, 2], [3, 4]], cols=list("ab"))
>> df.at[df["a"] == 1, "a"]

yields
ValueError: At based indexing on an integer index can only have integer indexers

I feel like this was possible before, so it was probably deprecated for a good reason that I'm not aware of. But if we're talking about un-deprecation, I think this should be considered as well.

Sorry if I open a closed discussion with that 😄

TomAugspurger · 2019-11-12T14:40:04Z

I'm fine with un-deprecating this. Is it a blocker for 1.0?

Would we add it to DataFrame, with a similar behavior to a 2D ndarray?

jorisvandenbossche · 2019-11-14T21:33:18Z

It's certainly not a blocker, but since it's deprecated, and if we want to do this, better sooner than later (and it should be a quick PR).

Would we add it to DataFrame, with a similar behavior to a 2D ndarray?

That could be done yes, although I personally find that less needed (boolean indexing on the columns is much less common I think)

Intuitively I thought .at could do what you search for. This would also yield a simpler API than chaining .loc and .item/.squeeze.

I don't think .at could ever do this in the past. It requires a single label.
But thinking about it, that might actually be a nice alternative. It's indeed shorter, and the end result (a single value out of the dataframe) is still the same as its current purpose.
It does complicate the API of .at though. Now it is very simple: a single label for each axis. That would be expanded with: a boolean mask with a single True value ..

shoyer · 2019-11-14T23:33:32Z

The other use case for NumPy's .item() is to pull out a built-in Python scalar (e.g., float), rather than a NumPy scalar (e.g., float64). I think this could still make sense for pandas.

jorisvandenbossche · 2019-11-15T09:11:24Z

For that use case, I suppose you would want to keep the "full" behaviour of numpy? (so also s.item(i) , while for the above discussed use case (getting rid of the Series container for len-1 Series), s.item() without argument is enough).

shoyer · 2019-11-15T14:19:16Z

I don't think it's really needed to support the full form of .item(i) with an argument. That version is definitely redundant with indexing.

…

On Fri, Nov 15, 2019 at 4:11 AM Joris Van den Bossche < ***@***.***> wrote: For that use case, I suppose you would want to keep the "full" behaviour of numpy? (so also s.item(i) , while for the above discussed use case (getting rid of the Series container for len-1 Series), s.item() without argument is enough). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#29250?email_source=notifications&email_token=AAJJFVV45BKH7H3ZHPALWHLQTZRUNA5CNFSM4JFWTTT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEZOWQ#issuecomment-554276698>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVWXNBDLB7D5UB7UYW3QTZRUNANCNFSM4JFWTTTQ> .

Since `Index.item()` is un-deprecated from pandas 1.0.0 (pandas-dev/pandas#29250), this PR proposes `Index.item()` and `MultiIndex.item()`. ```python >>> kidx = ks.Index([10]) >>> kidx.item() 10 >>> kmidx = ks.MultiIndex.from_tuples([('a', 'x')]) >>> kmidx.item() ('a', 'x') ```

jorisvandenbossche added API Design Deprecate Functionality to remove in pandas labels Oct 28, 2019

jorisvandenbossche added this to the 1.0 milestone Oct 28, 2019

jorisvandenbossche mentioned this issue Nov 28, 2019

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jorisvandenbossche mentioned this issue Dec 10, 2019

BUG+DEPR: undeprecate item, fix dt64/td64 output type #30175

Merged

5 tasks

jorisvandenbossche closed this as completed in #30175 Dec 18, 2019

itholic mentioned this issue Sep 2, 2020

Implemented item for Index & MultiIndex databricks/koalas#1744

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: consider undeprecating Series.item() ? #29250

API: consider undeprecating Series.item() ? #29250

jorisvandenbossche commented Oct 28, 2019

simonjayhawkins commented Oct 28, 2019

jorisvandenbossche commented Oct 28, 2019

simonjayhawkins commented Oct 28, 2019 •

edited

Loading

SaturnFromTitan commented Nov 1, 2019

TomAugspurger commented Nov 12, 2019

jorisvandenbossche commented Nov 14, 2019

shoyer commented Nov 14, 2019

jorisvandenbossche commented Nov 15, 2019

shoyer commented Nov 15, 2019 via email

API: consider undeprecating Series.item() ? #29250

API: consider undeprecating Series.item() ? #29250

Comments

jorisvandenbossche commented Oct 28, 2019

simonjayhawkins commented Oct 28, 2019

jorisvandenbossche commented Oct 28, 2019

simonjayhawkins commented Oct 28, 2019 • edited Loading

SaturnFromTitan commented Nov 1, 2019

TomAugspurger commented Nov 12, 2019

jorisvandenbossche commented Nov 14, 2019

shoyer commented Nov 14, 2019

jorisvandenbossche commented Nov 15, 2019

shoyer commented Nov 15, 2019 via email

simonjayhawkins commented Oct 28, 2019 •

edited

Loading