Skip to content

Commit

Permalink
Backport PR #53382 on branch 2.0.x (BUG: convert_dtypes(dtype_backend…
Browse files Browse the repository at this point in the history
…="pyarrow") losing tz for tz-aware dtypes) (#53388)

Backport PR #53382: BUG: convert_dtypes(dtype_backend="pyarrow") losing tz for tz-aware dtypes

Co-authored-by: Luke Manley <lukemanley@gmail.com>
  • Loading branch information
meeseeksmachine and lukemanley authored May 25, 2023
1 parent 258e55e commit 8fc7924
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Bug fixes
- Bug in :func:`read_csv` raising ``OverflowError`` for ``engine="pyarrow"`` and ``parse_dates`` set (:issue:`53295`)
- Bug in :func:`to_datetime` was inferring format to contain ``"%H"`` instead of ``"%I"`` if date contained "AM" / "PM" tokens (:issue:`53147`)
- Bug in :meth:`DataFrame.convert_dtypes` ignores ``convert_*`` keywords when set to False ``dtype_backend="pyarrow"`` (:issue:`52872`)
- Bug in :meth:`DataFrame.convert_dtypes` losing timezone for tz-aware dtypes and ``dtype_backend="pyarrow"`` (:issue:`53382`)
- Bug in :meth:`DataFrame.sort_values` raising for PyArrow ``dictionary`` dtype (:issue:`53232`)
- Bug in :meth:`Series.describe` treating pyarrow-backed timestamps and timedeltas as categorical data (:issue:`53001`)
- Bug in :meth:`Series.rename` not making a lazy copy when Copy-on-Write is enabled when a scalar is passed to it (:issue:`52450`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
is_object_dtype,
is_scalar,
)
from pandas.core.dtypes.dtypes import DatetimeTZDtype
from pandas.core.dtypes.missing import isna

from pandas.core import roperator
Expand Down Expand Up @@ -168,6 +169,8 @@ def to_pyarrow_type(
return dtype.pyarrow_dtype
elif isinstance(dtype, pa.DataType):
return dtype
elif isinstance(dtype, DatetimeTZDtype):
return pa.timestamp(dtype.unit, dtype.tz)
elif dtype:
try:
# Accepts python types too
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1134,7 +1134,9 @@ def convert_dtypes(
and not isinstance(inferred_dtype, StringDtype)
)
):
if isinstance(inferred_dtype, PandasExtensionDtype):
if isinstance(inferred_dtype, PandasExtensionDtype) and not isinstance(
inferred_dtype, DatetimeTZDtype
):
base_dtype = inferred_dtype.base
elif isinstance(inferred_dtype, (BaseMaskedDtype, ArrowDtype)):
base_dtype = inferred_dtype.numpy_dtype
Expand Down
13 changes: 12 additions & 1 deletion pandas/tests/frame/methods/test_convert_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ def test_pyarrow_dtype_backend(self):
"c": pd.Series([True, False, None], dtype=np.dtype("O")),
"d": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
"e": pd.Series(pd.date_range("2022", periods=3)),
"f": pd.Series(pd.timedelta_range("1D", periods=3)),
"f": pd.Series(pd.date_range("2022", periods=3, tz="UTC").as_unit("s")),
"g": pd.Series(pd.timedelta_range("1D", periods=3)),
}
)
result = df.convert_dtypes(dtype_backend="pyarrow")
Expand All @@ -76,6 +77,16 @@ def test_pyarrow_dtype_backend(self):
)
),
"f": pd.arrays.ArrowExtensionArray(
pa.array(
[
datetime.datetime(2022, 1, 1),
datetime.datetime(2022, 1, 2),
datetime.datetime(2022, 1, 3),
],
type=pa.timestamp(unit="s", tz="UTC"),
)
),
"g": pd.arrays.ArrowExtensionArray(
pa.array(
[
datetime.timedelta(1),
Expand Down

0 comments on commit 8fc7924

Please sign in to comment.