Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: DatetimeTZDtype __from_arrow__ interprets UTC values as wall time #56922

Merged
merged 4 commits into from
Jan 19, 2024

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 added Regression Functionality that used to work in a prior pandas version Arrow pyarrow functionality labels Jan 17, 2024
@lithomas1 lithomas1 added this to the 2.2 milestone Jan 17, 2024
@@ -232,7 +233,7 @@ def test_from_arrowtest_from_arrow_with_different_units_and_timezones_with_(
dtype = DatetimeTZDtype(unit=pd_unit, tz=pd_tz)

result = dtype.__from_arrow__(arr)
expected = DatetimeArray._from_sequence(
expected = DatetimeArray._simple_new(
Copy link
Member Author

@lithomas1 lithomas1 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was changed from the regular DatetimeArray constructor to DatetimeArray._from_sequence when the deprecation was done.

If this patch is correct, than that would've been wrong and _simple_new should also be the correct replacement here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would use a different constructor, I think (maybe just pd.array(..., dtype=dtype)?), because otherwise we are testing that __from_arrow__ which uses _simple_new gives the same result as _simple_new ... (which was now happening with _from_sequence)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Could do something like DatetimeArray._from_sequence(int64_data, dtype=f"M8[{unit}]").tz_localize("UTC").astype(self, copy=False)

Once pyarrow is required we can also make DTA._from_sequence Just Work with the pyarrow object as input

dtype=dtype,
expected = (
DatetimeArray._from_sequence(data, dtype=f"datetime64[{pa_unit}]")
.tz_localize("UTC")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we can actually do better than my previous suggestion: _from_sequence(int64_data, dtype="M8[unit, UTC]").astype(...) avoids a copy in tz_localize.
  2. better to do this construction in the non-test place and use simple_new in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "in the non-test place", you mean in __from_arrow__? But there is tz_localize in there to avoid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we can actually do better than my previous suggestion: _from_sequence(int64_data, dtype="M8[unit, UTC]").astype(...) avoids a copy in tz_localize.

Done

  1. better to do this construction in the non-test place and use simple_new in the test.

Not sure what you mean here either. Is it OK for me to do this in a followup?

(I'd really like to release tomorrow.)

@jorisvandenbossche
Copy link
Member

Agreed that the comment can be further tackled in a follow-up, so let's get this in.

@jorisvandenbossche jorisvandenbossche merged commit ee32f76 into pandas-dev:main Jan 19, 2024
49 of 50 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 19, 2024
@jorisvandenbossche
Copy link
Member

Thanks @lithomas1!

@lithomas1 lithomas1 deleted the regr-tz-arrow branch January 19, 2024 17:13
lithomas1 added a commit that referenced this pull request Jan 19, 2024
…w__ interprets UTC values as wall time) (#56962)

Backport PR #56922: REGR: DatetimeTZDtype __from_arrow__ interprets UTC values as wall time

Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: DatetimeTZDtype.__from_arrow__ interprets UTC values as local wall time
3 participants