Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for BUG: grouping with tz-aware: Values falls after last bin #24973

Merged
merged 16 commits into from
Jan 29, 2019
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Bug Fixes
**Reshaping**

- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)

Bug in ``DataFrame.groupby(pd.Grouper(freq='1d'))`` when there is a time change (DST) and grouping frequecy is 1d (:issue:`24972`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use :meth:`DataFrame.groupby with a separate :class:Grouper and freq='1D'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

**Other**

-
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1413,6 +1413,16 @@ def _get_time_bins(self, ax):
ambiguous='infer',
nonexistent='shift_forward')

# GH #24972
# In edge case of tz-aware grouping binner last index can be
# less than the ax.max() variable in data object, this happens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go in _adjust_bin_edges

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a lot of issues with generating the remaining bins when I try to move it there. if this way of generating the bins is the correct one then I believe that it is better to leave it here. in _adjust_bin_edges we lose information about ax tz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this should possibly be fixed in _get_timestamp_range_edges. This patch here seems like a bandaid.

Copy link
Contributor Author

@ahcub ahcub Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me apply the patch you suggested in the issue description, it does look better for me.

# because of normalization and DST time change
if len(binner) > 1 and binner[-1] < ax.max():
extra_date_range = pd.date_range(binner[-1], ax.max() + self.freq,
freq=self.freq, tz=binner[-1].tz,
name=ax.name)
binner = labels = binner.append(extra_date_range[1:])

ax_values = ax.asi8
binner, bin_edges = self._adjust_bin_edges(binner, ax_values)

Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1744,3 +1744,19 @@ def test_groupby_agg_ohlc_non_first():
result = df.groupby(pd.Grouper(freq='D')).agg(['sum', 'ohlc'])

tm.assert_frame_equal(result, expected)


def test_groupby_with_dst_time_change():
# GH 24972
import pandas as pd
index = pd.DatetimeIndex([1478064900001000000, 1480037118776792000],
tz='UTC').tz_convert('America/Chicago')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no import needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, sorry


df = pd.DataFrame([1, 2], index=index)
result = df.groupby(pd.Grouper(freq='1d')).last()
expected_index_values = pd.date_range('2016-11-02', '2016-11-24',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to go with other dst grouping tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, didn't find them right away, can you point me to few?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably in pandas/tests/groupby/test_timegrouper.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

freq='d', tz='America/Chicago')

index = pd.DatetimeIndex(expected_index_values)
expected = pd.DataFrame([1.0] + ([np.nan] * 21) + [2.0], index=index)
assert_frame_equal(result, expected)