-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: DatetimeTZDtype __from_arrow__ interprets UTC values as wall time #56922
Conversation
@@ -232,7 +233,7 @@ def test_from_arrowtest_from_arrow_with_different_units_and_timezones_with_( | |||
dtype = DatetimeTZDtype(unit=pd_unit, tz=pd_tz) | |||
|
|||
result = dtype.__from_arrow__(arr) | |||
expected = DatetimeArray._from_sequence( | |||
expected = DatetimeArray._simple_new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was changed from the regular DatetimeArray
constructor to DatetimeArray._from_sequence
when the deprecation was done.
If this patch is correct, than that would've been wrong and _simple_new
should also be the correct replacement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would use a different constructor, I think (maybe just pd.array(..., dtype=dtype)
?), because otherwise we are testing that __from_arrow__
which uses _simple_new
gives the same result as _simple_new
... (which was now happening with _from_sequence
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Could do something like DatetimeArray._from_sequence(int64_data, dtype=f"M8[{unit}]").tz_localize("UTC").astype(self, copy=False)
Once pyarrow is required we can also make DTA._from_sequence Just Work with the pyarrow object as input
dtype=dtype, | ||
expected = ( | ||
DatetimeArray._from_sequence(data, dtype=f"datetime64[{pa_unit}]") | ||
.tz_localize("UTC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we can actually do better than my previous suggestion:
_from_sequence(int64_data, dtype="M8[unit, UTC]").astype(...
) avoids a copy in tz_localize. - better to do this construction in the non-test place and use simple_new in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "in the non-test place", you mean in __from_arrow__
? But there is tz_localize in there to avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we can actually do better than my previous suggestion:
_from_sequence(int64_data, dtype="M8[unit, UTC]").astype(...
) avoids a copy in tz_localize.
Done
- better to do this construction in the non-test place and use simple_new in the test.
Not sure what you mean here either. Is it OK for me to do this in a followup?
(I'd really like to release tomorrow.)
Agreed that the comment can be further tackled in a follow-up, so let's get this in. |
…terprets UTC values as wall time
Thanks @lithomas1! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.