Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for BUG: grouping with tz-aware: Values falls after last bin #24973

Merged
merged 16 commits into from
Jan 29, 2019
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Bug Fixes
**Reshaping**

- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)

Bug in ``DataFrame.groupby(pd.Grouper(freq='1d'))`` when there is a time change (DST) and grouping frequecy is 1d (:issue:`24972`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use :meth:`DataFrame.groupby with a separate :class:Grouper and freq='1D'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

**Other**

-
Expand Down
31 changes: 15 additions & 16 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@
from pandas.core.indexes.timedeltas import TimedeltaIndex, timedelta_range

from pandas.tseries.frequencies import to_offset
from pandas.tseries.offsets import (
DateOffset, Day, Nano, Tick, delta_to_nanoseconds)
from pandas.tseries.offsets import DateOffset, Day, Nano, Tick

_shared_docs_kwargs = dict()

Expand Down Expand Up @@ -1613,20 +1612,20 @@ def _get_timestamp_range_edges(first, last, offset, closed='left', base=0):
A tuple of length 2, containing the adjusted pd.Timestamp objects.
"""
if isinstance(offset, Tick):
is_day = isinstance(offset, Day)
day_nanos = delta_to_nanoseconds(timedelta(1))

# #1165 and #24127
if (is_day and not offset.nanos % day_nanos) or not is_day:
first, last = _adjust_dates_anchored(first, last, offset,
closed=closed, base=base)
if is_day and first.tz is not None:
# _adjust_dates_anchored assumes 'D' means 24H, but first/last
# might contain a DST transition (23H, 24H, or 25H).
# Ensure first/last snap to midnight.
first = first.normalize()
last = last.normalize()
return first, last
if isinstance(offset, Day):
# _adjust_dates_anchored assumes 'D' means 24H, but first/last
# might contain a DST transition (23H, 24H, or 25H).
# So "pretend" the dates are naive when adjusting the endpoints
tz = first.tz
first = first.tz_localize(None)
last = last.tz_localize(None)

first, last = _adjust_dates_anchored(first, last, offset,
closed=closed, base=base)
if isinstance(offset, Day):
first = first.tz_localize(tz)
last = last.tz_localize(tz)
return first, last

else:
first = first.normalize()
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1744,3 +1744,18 @@ def test_groupby_agg_ohlc_non_first():
result = df.groupby(pd.Grouper(freq='D')).agg(['sum', 'ohlc'])

tm.assert_frame_equal(result, expected)


def test_groupby_with_dst_time_change():
# GH 24972
index = pd.DatetimeIndex([1478064900001000000, 1480037118776792000],
tz='UTC').tz_convert('America/Chicago')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no import needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, sorry


df = pd.DataFrame([1, 2], index=index)
result = df.groupby(pd.Grouper(freq='1d')).last()
expected_index_values = pd.date_range('2016-11-02', '2016-11-24',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to go with other dst grouping tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, didn't find them right away, can you point me to few?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably in pandas/tests/groupby/test_timegrouper.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

freq='d', tz='America/Chicago')

index = pd.DatetimeIndex(expected_index_values)
expected = pd.DataFrame([1.0] + ([np.nan] * 21) + [2.0], index=index)
assert_frame_equal(result, expected)