Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate tshift and integrate it to shift #34545

Merged
merged 8 commits into from
Jun 15, 2020
20 changes: 8 additions & 12 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ The ``DatetimeIndex`` class contains many time series related optimizations:
* A large range of dates for various offsets are pre-computed and cached
under the hood in order to make generating subsequent date ranges very fast
(just have to grab a slice).
* Fast shifting using the ``shift`` and ``tshift`` method on pandas objects.
* Fast shifting using the ``shift`` method on pandas objects.
* Unioning of overlapping ``DatetimeIndex`` objects with the same frequency is
very fast (important for fast data alignment).
* Quick access to date fields via properties such as ``year``, ``month``, etc.
Expand Down Expand Up @@ -1462,23 +1462,19 @@ the pandas objects.

The ``shift`` method accepts an ``freq`` argument which can accept a
``DateOffset`` class or other ``timedelta``-like object or also an
:ref:`offset alias <timeseries.offset_aliases>`:
:ref:`offset alias <timeseries.offset_aliases>`.

When ``freq`` is specified, ``shift`` method changes all the dates in the index
rather than changing the alignment of the data and the index:

.. ipython:: python

ts.shift(5, freq='D')
ts.shift(5, freq=pd.offsets.BDay())
ts.shift(5, freq='BM')

Rather than changing the alignment of the data and the index, ``DataFrame`` and
``Series`` objects also have a :meth:`~Series.tshift` convenience method that
changes all the dates in the index by a specified number of offsets:

.. ipython:: python

ts.tshift(5, freq='D')

Note that with ``tshift``, the leading entry is no longer NaN because the data
is not being realigned.
Note that with when ``freq`` is specified, the leading entry is no longer NaN
because the data is not being realigned.

Frequency conversion
~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -737,6 +737,7 @@ Deprecations
- :meth:`DatetimeIndex.week` and `DatetimeIndex.weekofyear` are deprecated and will be removed in a future version, use :meth:`DatetimeIndex.isocalendar().week` instead (:issue:`33595`)
- :meth:`DatetimeArray.week` and `DatetimeArray.weekofyear` are deprecated and will be removed in a future version, use :meth:`DatetimeArray.isocalendar().week` instead (:issue:`33595`)
- :meth:`DateOffset.__call__` is deprecated and will be removed in a future version, use ``offset + other`` instead (:issue:`34171`)
- :meth:`DataFrame.tshift` and :meth:`Series.tshift` are deprecated and will be removed in a future version, use :meth:`DataFrame.shift` and :meth:`Series.shift` instead (:issue:`11631`)
- Indexing an :class:`Index` object with a float key is deprecated, and will
raise an ``IndexError`` in the future. You can manually convert to an integer key
instead (:issue:`34191`).
Expand Down
159 changes: 100 additions & 59 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ class NDFrame(PandasObject, SelectionMixin, indexing.IndexingMixin):
]
_internal_names_set: Set[str] = set(_internal_names)
_accessors: Set[str] = set()
_deprecations: FrozenSet[str] = frozenset(["get_values"])
_deprecations: FrozenSet[str] = frozenset(["get_values", "tshift"])
_metadata: List[str] = []
_is_copy = None
_mgr: BlockManager
Expand Down Expand Up @@ -9130,7 +9130,9 @@ def shift(
When `freq` is not passed, shift the index without realigning the data.
If `freq` is passed (in this case, the index must be date or datetime,
or it will raise a `NotImplementedError`), the index will be
increased using the periods and the `freq`.
increased using the periods and the `freq`. `freq` can be inferred
when specified as "infer" as long as either freq or inferred_freq
attribute is set in the index.

Parameters
----------
Expand All @@ -9141,6 +9143,9 @@ def shift(
If `freq` is specified then the index values are shifted but the
data is not realigned. That is, use `freq` if you would like to
extend the index when shifting and preserve the original data.
If `freq` is specified as "infer" then it will be inferred from
the freq or inferred_freq attributes of the index. If neither of
those attributes exist, a ValueError is thrown
axis : {{0 or 'index', 1 or 'columns', None}}, default None
Shift direction.
fill_value : object, optional
Expand All @@ -9150,7 +9155,7 @@ def shift(
For datetime, timedelta, or period data, etc. :attr:`NaT` is used.
For extension dtypes, ``self.dtype.na_value`` is used.

.. versionchanged:: 0.24.0
.. versionchanged:: 1.1.0

Returns
-------
Expand All @@ -9167,46 +9172,99 @@ def shift(

Examples
--------
>>> df = pd.DataFrame({{'Col1': [10, 20, 15, 30, 45],
... 'Col2': [13, 23, 18, 33, 48],
... 'Col3': [17, 27, 22, 37, 52]}})
>>> df = pd.DataFrame({{"Col1": [10, 20, 15, 30, 45],
... "Col2": [13, 23, 18, 33, 48],
... "Col3": [17, 27, 22, 37, 52]}},
... index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
Col1 Col2 Col3
2020-01-01 10 13 17
2020-01-02 20 23 27
2020-01-03 15 18 22
2020-01-04 30 33 37
2020-01-05 45 48 52

>>> df.shift(periods=3)
Col1 Col2 Col3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 10.0 13.0 17.0
4 20.0 23.0 27.0

>>> df.shift(periods=1, axis='columns')
Col1 Col2 Col3
0 NaN 10.0 13.0
1 NaN 20.0 23.0
2 NaN 15.0 18.0
3 NaN 30.0 33.0
4 NaN 45.0 48.0
Col1 Col2 Col3
2020-01-01 NaN NaN NaN
2020-01-02 NaN NaN NaN
2020-01-03 NaN NaN NaN
2020-01-04 10.0 13.0 17.0
2020-01-05 20.0 23.0 27.0

>>> df.shift(periods=1, axis="columns")
Col1 Col2 Col3
2020-01-01 NaN 10.0 13.0
2020-01-02 NaN 20.0 23.0
2020-01-03 NaN 15.0 18.0
2020-01-04 NaN 30.0 33.0
2020-01-05 NaN 45.0 48.0

>>> df.shift(periods=3, fill_value=0)
Col1 Col2 Col3
0 0 0 0
1 0 0 0
2 0 0 0
3 10 13 17
4 20 23 27
Col1 Col2 Col3
2020-01-01 0 0 0
2020-01-02 0 0 0
2020-01-03 0 0 0
2020-01-04 10 13 17
2020-01-05 20 23 27

>>> df.shift(periods=3, freq="D")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52

>>> df.shift(periods=3, freq="infer")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52
"""
if periods == 0:
return self.copy()

block_axis = self._get_block_manager_axis(axis)
if freq is None:
# when freq is None, data is shifted, index is not
block_axis = self._get_block_manager_axis(axis)
new_data = self._mgr.shift(
periods=periods, axis=block_axis, fill_value=fill_value
)
return self._constructor(new_data).__finalize__(self, method="shift")

# when freq is given, index is shifted, data is not
index = self._get_axis(axis)

if freq == "infer":
freq = getattr(index, "freq", None)

if freq is None:
freq = getattr(index, "inferred_freq", None)

if freq is None:
msg = "Freq was not set in the index hence cannot be inferred"
raise ValueError(msg)

elif isinstance(freq, str):
freq = to_offset(freq)

if isinstance(index, PeriodIndex):
orig_freq = to_offset(index.freq)
if freq != orig_freq:
assert orig_freq is not None # for mypy
raise ValueError(
f"Given freq {freq.rule_code} does not match "
f"PeriodIndex freq {orig_freq.rule_code}"
)
new_ax = index.shift(periods)
else:
return self.tshift(periods, freq)
new_ax = index.shift(periods, freq)

return self._constructor(new_data).__finalize__(self, method="shift")
result = self.set_axis(new_ax, axis)
return result.__finalize__(self, method="shift")

def slice_shift(self: FrameOrSeries, periods: int = 1, axis=0) -> FrameOrSeries:
"""
Expand Down Expand Up @@ -9251,6 +9309,9 @@ def tshift(
"""
Shift the time index, using the index's frequency if available.

.. deprecated:: 1.1.0
Use `shift` instead.

Parameters
----------
periods : int
Expand All @@ -9271,39 +9332,19 @@ def tshift(
attributes of the index. If neither of those attributes exist, a
ValueError is thrown
"""
index = self._get_axis(axis)
if freq is None:
freq = getattr(index, "freq", None)

if freq is None:
freq = getattr(index, "inferred_freq", None)
warnings.warn(
(
"tshift is deprecated and will be removed in a future version. "
"Please use shift instead."
),
FutureWarning,
stacklevel=2,
)

if freq is None:
msg = "Freq was not given and was not set in the index"
raise ValueError(msg)

if periods == 0:
return self

if isinstance(freq, str):
freq = to_offset(freq)

axis = self._get_axis_number(axis)
if isinstance(index, PeriodIndex):
orig_freq = to_offset(index.freq)
if freq != orig_freq:
assert orig_freq is not None # for mypy
raise ValueError(
f"Given freq {freq.rule_code} does not match "
f"PeriodIndex freq {orig_freq.rule_code}"
)
new_ax = index.shift(periods)
else:
new_ax = index.shift(periods, freq)
freq = "infer"

result = self.copy()
result.set_axis(new_ax, axis, inplace=True)
return result.__finalize__(self, method="tshift")
return self.shift(periods, freq, axis)

def truncate(
self: FrameOrSeries, before=None, after=None, axis=None, copy: bool_t = True
Expand Down
55 changes: 54 additions & 1 deletion pandas/tests/frame/methods/test_shift.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,10 @@ def test_shift_duplicate_columns(self):
tm.assert_frame_equal(shifted[0], shifted[1])
tm.assert_frame_equal(shifted[0], shifted[2])

@pytest.mark.filterwarnings("ignore:tshift is deprecated:FutureWarning")
def test_tshift(self, datetime_frame):
# TODO: remove this test when tshift deprecation is enforced

# PeriodIndex
ps = tm.makePeriodFrame()
shifted = ps.tshift(1)
Expand Down Expand Up @@ -186,10 +189,60 @@ def test_tshift(self, datetime_frame):
tm.assert_frame_equal(unshifted, inferred_ts)

no_freq = datetime_frame.iloc[[0, 5, 7], :]
msg = "Freq was not given and was not set in the index"
msg = "Freq was not set in the index hence cannot be inferred"
with pytest.raises(ValueError, match=msg):
no_freq.tshift()

def test_tshift_deprecated(self, datetime_frame):
# GH#11631
with tm.assert_produces_warning(FutureWarning):
datetime_frame.tshift()

def test_shift_with_freq(self, datetime_frame):
# PeriodIndex
ps = tm.makePeriodFrame()
shifted = ps.shift(1, freq="infer")
unshifted = shifted.shift(-1, freq="infer")

tm.assert_frame_equal(unshifted, ps)

shifted2 = ps.shift(freq="B")
tm.assert_frame_equal(shifted, shifted2)

shifted3 = ps.shift(freq=offsets.BDay())
fujiaxiang marked this conversation as resolved.
Show resolved Hide resolved
tm.assert_frame_equal(shifted, shifted3)

with pytest.raises(ValueError, match="does not match"):
ps.shift(freq="M")

# DatetimeIndex
shifted = datetime_frame.shift(1, freq="infer")
unshifted = shifted.shift(-1, freq="infer")

tm.assert_frame_equal(datetime_frame, unshifted)

shifted2 = datetime_frame.shift(freq=datetime_frame.index.freq)
tm.assert_frame_equal(shifted, shifted2)

inferred_ts = DataFrame(
datetime_frame.values,
Index(np.asarray(datetime_frame.index)),
columns=datetime_frame.columns,
)
shifted = inferred_ts.shift(1, freq="infer")

expected = datetime_frame.shift(1, freq="infer")
expected.index = expected.index._with_freq(None)
tm.assert_frame_equal(shifted, expected)

unshifted = shifted.shift(-1, freq="infer")
tm.assert_frame_equal(unshifted, inferred_ts)

no_freq = datetime_frame.iloc[[0, 5, 7], :]
msg = "Freq was not set in the index hence cannot be inferred"
with pytest.raises(ValueError, match=msg):
no_freq.shift(freq="infer")

def test_shift_dt64values_int_fill_deprecated(self):
# GH#31971
ser = pd.Series([pd.Timestamp("2020-01-01"), pd.Timestamp("2020-01-02")])
Expand Down
20 changes: 15 additions & 5 deletions pandas/tests/generic/test_finalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -438,11 +438,21 @@
(pd.DataFrame, frame_data, operator.methodcaller("mask", np.array([[True]]))),
(pd.Series, ([1, 2],), operator.methodcaller("slice_shift")),
(pd.DataFrame, frame_data, operator.methodcaller("slice_shift")),
(pd.Series, (1, pd.date_range("2000", periods=4)), operator.methodcaller("tshift")),
(
pd.DataFrame,
({"A": [1, 1, 1, 1]}, pd.date_range("2000", periods=4)),
operator.methodcaller("tshift"),
pytest.param(
(
pd.Series,
(1, pd.date_range("2000", periods=4)),
operator.methodcaller("tshift"),
),
marks=pytest.mark.filterwarnings("ignore::FutureWarning"),
),
pytest.param(
(
pd.DataFrame,
({"A": [1, 1, 1, 1]}, pd.date_range("2000", periods=4)),
operator.methodcaller("tshift"),
),
marks=pytest.mark.filterwarnings("ignore::FutureWarning"),
),
(pd.Series, ([1, 2],), operator.methodcaller("truncate", before=0)),
(pd.DataFrame, frame_data, operator.methodcaller("truncate", before=0)),
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1979,6 +1979,7 @@ def test_bool_aggs_dup_column_labels(bool_agg_func):
@pytest.mark.parametrize(
"idx", [pd.Index(["a", "a"]), pd.MultiIndex.from_tuples((("a", "a"), ("a", "a")))]
)
@pytest.mark.filterwarnings("ignore:tshift is deprecated:FutureWarning")
def test_dup_labels_output_shape(groupby_func, idx):
if groupby_func in {"size", "ngroup", "cumcount"}:
pytest.skip("Not applicable")
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_groupby_subclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
tm.SubclassedSeries(np.arange(0, 10), name="A"),
],
)
@pytest.mark.filterwarnings("ignore:tshift is deprecated:FutureWarning")
def test_groupby_preserves_subclass(obj, groupby_func):
# GH28330 -- preserve subclass through groupby operations

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_whitelist.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ def test_groupby_function_rename(mframe):
assert f.__name__ == name


@pytest.mark.filterwarnings("ignore:tshift is deprecated:FutureWarning")
def test_groupby_selection_with_methods(df):
# some methods which require DatetimeIndex
rng = date_range("2014", periods=len(df))
Expand Down
Loading