Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby.transform retains timezone information #25264

Merged
merged 12 commits into from
Feb 16, 2019
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ Reshaping

- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- Bug in :meth:`pandas.core.groupby.GroupBy.transform` where applying a function to a timezone aware column would return a timezone naive result (:issue:`24198`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for 0.24.2

- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
-

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -964,7 +964,7 @@ def _transform_fast(self, func, func_nm):

ids, _, ngroup = self.grouper.group_info
cast = self._transform_should_cast(func_nm)
out = algorithms.take_1d(func().values, ids)
out = algorithms.take_1d(func()._values, ids)
if cast:
out = self._try_cast(out, self.obj)
return Series(out, index=self.obj.index, name=self.obj.name)
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/groupby/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -834,3 +834,13 @@ def demean_rename(x):
tm.assert_frame_equal(result, expected)
result_single = df.groupby('group').value.transform(demean_rename)
tm.assert_series_equal(result_single, expected['value'])


def test_groupby_transform_timezone_column():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's too much here but might be worth creating a fixture for transformation functions and parametrizing based off of that here. That way we could ensure that the tz info doesn't get lost across all instead of just max.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. For this test though, we need reducing functions that are applicable for timestamps though. All that make sense to me is min and max unless I am forgetting any other applicable functions.

If the list is short, may be worth just parameterizing over directly instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea agreed with your approach. I think it's worth while to start fixturizing these tests to ensure full coverage on particular subsets of algorithms, just don't know what those subsets should all be just yet. I think the way you've done it makes that refactor more apparent in the future if we ever get there

# GH 24198
ts = pd.to_datetime('now', utc=True).tz_convert('Asia/Singapore')
result = pd.DataFrame({'end_time': [ts], 'id': [1]})
result['max_end_time'] = result.groupby('id').end_time.transform(max)
expected = pd.DataFrame([[ts, 1, ts]], columns=['end_time', 'id',
'max_end_time'])
tm.assert_frame_equal(result, expected)