Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG+DEPR: undeprecate item, fix dt64/td64 output type #30175

Merged
merged 10 commits into from
Dec 18, 2019
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ Documentation Improvements
Deprecations
~~~~~~~~~~~~

- :meth:`Series.item` and :meth:`Index.item` have been _undeprecated_ (:issue:`29250`)
- ``Index.set_value`` has been deprecated. For a given index ``idx``, array ``arr``,
value in ``idx`` of ``idx_val`` and a new value of ``val``, ``idx.set_value(arr, idx_val, val)``
is equivalent to ``arr[idx.get_loc(idx_val)] = val``, which should be used instead (:issue:`28621`).
Expand Down Expand Up @@ -702,6 +703,8 @@ Datetimelike
- Bug in :attr:`Timestamp.resolution` being a property instead of a class attribute (:issue:`29910`)
- Bug in :func:`pandas.to_datetime` when called with ``None`` raising ``TypeError`` instead of returning ``NaT`` (:issue:`30011`)
- Bug in :func:`pandas.to_datetime` failing for `deques` when using ``cache=True`` (the default) (:issue:`29403`)
- Bug in :meth:`Series.item` with ``datetime64`` or ``timedelta64`` dtype, :meth:`DatetimeIndex.item`, and :meth:`TimedeltaIndex.item` returning an integer instead of a :class:`Timestamp` or :class:`Timedelta` (:issue:`30175`)
-

Timedelta
^^^^^^^^^
Expand Down
26 changes: 17 additions & 9 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
from collections import OrderedDict
import textwrap
from typing import Dict, FrozenSet, List, Optional
import warnings

import numpy as np

Expand All @@ -26,6 +25,7 @@
is_object_dtype,
is_scalar,
is_timedelta64_ns_dtype,
needs_i8_conversion,
)
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -659,19 +659,27 @@ def item(self):
"""
Return the first element of the underlying data as a python scalar.

.. deprecated:: 0.25.0

Returns
-------
scalar
The first element of %(klass)s.

Raises
------
ValueError
If the data is not length-1.
"""
warnings.warn(
"`item` has been deprecated and will be removed in a future version",
FutureWarning,
stacklevel=2,
)
return self.values.item()
if not (
is_extension_array_dtype(self.dtype) or needs_i8_conversion(self.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this redundant? as all needs_i8_conversion are already EA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, dt64 and td64 are need i8 conversion but are not EA

):
# numpy returns ints instead of datetime64/timedelta64 objects,
# which we need to wrap in Timestamp/Timedelta/Period regardless.
return self.values.item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for ExtensionArrays. We can discuss adding item to the interface, but I would rather (or at least for now) let ExtensionArrays take the path you have below that uses iteration (which should already handle the conversion to a python scalar)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be +1 on adding to EA arrays, why have inconsistency in code paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for ExtensionArrays

This uses .values, so will convert to ndarray and then call item. So it shouldn't be any more broken than what we have now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no objection to adding item to EAs separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it shouldn't be any more broken than what we have now.

And to fix that, the only thing that is needed is adding a and not is_extension_array_dtype(self.dtype): to the above if check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im happy to do that here, will need tests in a follow-up


if len(self) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was adding this condition discussed somewhere? I would have thought just keep existing behaviour

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bugfix part. dt64, dt64tz, and td64 we're currently incorrectly returning int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm doesn't this break non-DTA though?

>>> type(pd.Series(range(1)).item())
<class 'int'>
>>> type(pd.Series(range(1))[0])
<class 'numpy.int64'>

I thought one of the points of item was to return a Python object (at least in the Numpy world)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ current behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should keep the behaviour of item to return a python scalar (where possible of course, so for datetime/timedelta it is fine to return a pandas Timestamp/Timedelta I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concerns raised by @WillAyd and @jorisvandenbossche have been addressed

return next(iter(self))
else:
raise ValueError("can only convert an array of size 1 to a Python scalar")

@property
def nbytes(self):
Expand Down
22 changes: 0 additions & 22 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from datetime import datetime, timedelta
import warnings
import weakref

import numpy as np
Expand Down Expand Up @@ -862,27 +861,6 @@ def __setstate__(self, state):

_unpickle_compat = __setstate__

def item(self):
"""
Return the first element of the underlying data as a python
scalar

.. deprecated:: 0.25.0

"""
warnings.warn(
"`item` has been deprecated and will be removed in a future version",
FutureWarning,
stacklevel=2,
)
# TODO(DatetimeArray): remove
if len(self) == 1:
return self[0]
else:
# TODO: is this still necessary?
# copy numpy's message here because Py26 raises an IndexError
raise ValueError("can only convert an array of size 1 to a Python scalar")

def memory_usage(self, deep=False):
result = super().memory_usage(deep=deep)
if hasattr(self, "_cache") and "_int64index" in self._cache:
Expand Down
8 changes: 3 additions & 5 deletions pandas/tests/base/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,15 +236,13 @@ def test_ndarray_compat_properties(self):
assert not hasattr(o, p)

with pytest.raises(ValueError):
with tm.assert_produces_warning(FutureWarning):
o.item() # len > 1
o.item() # len > 1

assert o.ndim == 1
assert o.size == len(o)

with tm.assert_produces_warning(FutureWarning):
assert Index([1]).item() == 1
assert Series([1]).item() == 1
assert Index([1]).item() == 1
assert Series([1]).item() == 1

def test_value_counts_unique_nunique(self):
for orig in self.objs:
Expand Down
54 changes: 46 additions & 8 deletions pandas/tests/series/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@
DatetimeIndex,
Index,
Series,
Timedelta,
TimedeltaIndex,
Timestamp,
date_range,
period_range,
timedelta_range,
)
from pandas.core.arrays import PeriodArray
from pandas.core.indexes.datetimes import Timestamp
import pandas.util.testing as tm

import pandas.io.formats.printing as printing
Expand Down Expand Up @@ -398,6 +399,50 @@ def test_numpy_unique(self, datetime_series):
# it works!
np.unique(datetime_series)

def test_item(self):
s = Series([1])
result = s.item()
assert result == 1
assert result == s.iloc[0]
assert isinstance(result, int) # i.e. not np.int64

ser = Series([0.5], index=[3])
result = ser.item()
assert isinstance(result, float)
assert result == 0.5

ser = Series([1, 2])
msg = "can only convert an array of size 1"
with pytest.raises(ValueError, match=msg):
ser.item()

dti = pd.date_range("2016-01-01", periods=2)
with pytest.raises(ValueError, match=msg):
dti.item()
with pytest.raises(ValueError, match=msg):
Series(dti).item()

val = dti[:1].item()
assert isinstance(val, Timestamp)
val = Series(dti)[:1].item()
assert isinstance(val, Timestamp)

tdi = dti - dti
with pytest.raises(ValueError, match=msg):
tdi.item()
with pytest.raises(ValueError, match=msg):
Series(tdi).item()

val = tdi[:1].item()
assert isinstance(val, Timedelta)
val = Series(tdi)[:1].item()
assert isinstance(val, Timedelta)

# Case where ser[0] would not work
ser = Series(dti, index=[5, 6])
val = ser[:1].item()
assert val == dti[0]

def test_ndarray_compat(self):

# test numpy compat with Series as sub-class of NDFrame
Expand All @@ -414,13 +459,6 @@ def f(x):
expected = tsdf.max()
tm.assert_series_equal(result, expected)

# .item()
with tm.assert_produces_warning(FutureWarning):
s = Series([1])
result = s.item()
assert result == 1
assert s.item() == s.iloc[0]

# using an ndarray like function
s = Series(np.random.randn(10))
result = Series(np.ones_like(s))
Expand Down