Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: DataFrame.transpose with dt64tz #40149

Merged
merged 166 commits into from
May 17, 2021
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
13bd448
REF/POC: back DTBlock/TDBlock directly by DTA/TDA
jbrockmendel Feb 17, 2021
be797e9
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 17, 2021
35cdd16
REF: _values_compat
jbrockmendel Feb 17, 2021
55c2568
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 17, 2021
0cc7e5b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 17, 2021
213afb3
test, asv
jbrockmendel Feb 18, 2021
52fa07a
TST: port Dim2CompatTests
jbrockmendel Feb 18, 2021
50ba370
Merge branch 'master' into tst-2d
jbrockmendel Feb 18, 2021
94f9027
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 18, 2021
fda6048
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 19, 2021
44e371f
REF: consolidate paths for astype
jbrockmendel Feb 19, 2021
45360e0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 19, 2021
46229ac
Merge branch 'prelim-hybrid' into ref-hybrid-3
jbrockmendel Feb 19, 2021
2745b37
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 19, 2021
29a1909
dont hardcode dt64tz
jbrockmendel Feb 19, 2021
1be48ee
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 19, 2021
c0aa860
just one contains_datetime check
jbrockmendel Feb 20, 2021
3fc07a6
Simplify _maybe_coerce_values
jbrockmendel Feb 20, 2021
093cf8b
remove _dtype
jbrockmendel Feb 20, 2021
1460ff3
remove _holder, fill_value
jbrockmendel Feb 20, 2021
33e5d86
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 21, 2021
4cd0548
Merge branch 'master' into tst-2d
jbrockmendel Feb 21, 2021
a826b4a
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 21, 2021
2d6fffc
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 21, 2021
00185cc
Merge branch 'tst-2d' into enh-fillna-2d
jbrockmendel Feb 21, 2021
b273600
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 22, 2021
378f168
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 22, 2021
1938bf6
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 22, 2021
f92f81f
PERF: .array->._values
jbrockmendel Feb 22, 2021
a36b36b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 22, 2021
c0208bc
PERF: get_values
jbrockmendel Feb 23, 2021
846e1dc
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 23, 2021
5081d27
perf
jbrockmendel Feb 23, 2021
5fc7efa
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 23, 2021
44e7343
Merge branch 'master' into enh-fillna-2d
jbrockmendel Feb 23, 2021
069a08a
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 23, 2021
7b5a67a
CLN: comments
jbrockmendel Feb 23, 2021
be4a243
remove unnecessary extract_array
jbrockmendel Feb 23, 2021
882373b
dont override EA base class
jbrockmendel Feb 24, 2021
130ff23
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 24, 2021
5cca7fb
revert perf workarounds
jbrockmendel Feb 24, 2021
23411a5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 24, 2021
1efc59b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 24, 2021
3215b35
Merge branch 'master' into enh-fillna-2d
jbrockmendel Feb 25, 2021
819e8e9
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 25, 2021
3a6e463
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 26, 2021
d3a39bc
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 26, 2021
b7abcd4
NIE instead of assert false
jbrockmendel Feb 26, 2021
9c339fd
Merge branch 'master' into enh-fillna-2d
jbrockmendel Feb 26, 2021
f909e48
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 26, 2021
8863c87
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 27, 2021
6579283
Merge branch 'master' into ref-hybrid-3
jbrockmendel Feb 28, 2021
dabb8e2
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 1, 2021
10a78b9
fix simple_new usage
jbrockmendel Mar 1, 2021
5f7d6db
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 1, 2021
5fac38e
mypy fixup
jbrockmendel Mar 1, 2021
7bb85b6
Fix test on older numpys
jbrockmendel Mar 2, 2021
399d722
fastparquet compat
jbrockmendel Mar 2, 2021
dff0fec
troubleshoot
jbrockmendel Mar 2, 2021
b9d8231
array manager tests
jbrockmendel Mar 2, 2021
3469bb5
array manager tests
jbrockmendel Mar 2, 2021
b25ccd6
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 2, 2021
ad966d0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 2, 2021
26ad122
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 2, 2021
9602a74
array-manager tests
jbrockmendel Mar 2, 2021
33d8a24
array-manager test
jbrockmendel Mar 2, 2021
38143fc
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 3, 2021
2a0f7de
troubleshoot docbuild
jbrockmendel Mar 3, 2021
f7a03f9
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 4, 2021
ad917de
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 4, 2021
6c0674f
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 4, 2021
063df5d
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 5, 2021
1019aad
cleanups
jbrockmendel Mar 5, 2021
70d6afe
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 5, 2021
9faf9ab
update for quantile
jbrockmendel Mar 5, 2021
c20f138
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 5, 2021
2c864f0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 5, 2021
3e5ca3b
fix array-manager quantile tests
jbrockmendel Mar 5, 2021
7916fff
better names for dtype checks, un-skip array-manager tests
jbrockmendel Mar 5, 2021
d8b49cd
CLN: use ensure_block_shape
jbrockmendel Mar 6, 2021
9eaf602
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 6, 2021
5f3382c
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 7, 2021
5f6099b
Merge branch 'master' into enh-fillna-2d
jbrockmendel Mar 8, 2021
84613c7
ENH: NDArrayBackedExtensionArray.fillna(method) with 2d
jbrockmendel Mar 8, 2021
78ca05c
Merge branch 'enh-fillna-2d' into ref-hybrid-3
jbrockmendel Mar 8, 2021
e8a54af
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 8, 2021
68fa34a
dont consolidate DTZ blocks
jbrockmendel Mar 8, 2021
3c824dc
revert pytables edits not needed without cosnolidation
jbrockmendel Mar 8, 2021
58a3f2c
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 9, 2021
128dd3a
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 9, 2021
d3ae448
mypy fixup
jbrockmendel Mar 10, 2021
665017f
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 10, 2021
11e6182
Fix json kludge
jbrockmendel Mar 10, 2021
7ace6ec
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 11, 2021
d89fd76
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 12, 2021
e2c014c
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 12, 2021
b22da32
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 12, 2021
1216f50
troubleshoot array-manager testt
jbrockmendel Mar 12, 2021
5a02f3e
troubleshoot mypy
jbrockmendel Mar 13, 2021
f8f962f
missing import
jbrockmendel Mar 13, 2021
60a2def
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Mar 13, 2021
4b694e5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 14, 2021
dab6fd2
fastparquet compat
jbrockmendel Mar 14, 2021
fbac441
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 14, 2021
77202c4
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 15, 2021
42ca357
maybe_coerce_values where appropriate
jbrockmendel Mar 15, 2021
2ab83ae
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 15, 2021
85b8b3f
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 16, 2021
6ac1fea
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 18, 2021
9e25bd5
update exception message
jbrockmendel Mar 18, 2021
3680633
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 19, 2021
e89b0f5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 20, 2021
b775a29
comment
jbrockmendel Mar 22, 2021
a771fef
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 22, 2021
c7cb37b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 24, 2021
fec04d0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 26, 2021
ff835af
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 26, 2021
09f012a
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Mar 27, 2021
cd029f5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 30, 2021
cbf0370
remove commented-out
jbrockmendel Mar 30, 2021
319c22d
mypy fixup
jbrockmendel Mar 30, 2021
b45da5e
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 31, 2021
b4aa88c
Merge branch 'master' into ref-hybrid-3
jbrockmendel Mar 31, 2021
1c49317
de-privatize
jbrockmendel Mar 31, 2021
b18d39b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 2, 2021
aa30359
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 2, 2021
e808bf5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 2, 2021
db75d59
simplify fastparquet shim
jbrockmendel Apr 2, 2021
780fa1c
one more kludge revert
jbrockmendel Apr 2, 2021
da9f50c
one more kludge revert
jbrockmendel Apr 2, 2021
ab1c336
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 2, 2021
e6e2e7b
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 2, 2021
acc707d
trim diff
jbrockmendel Apr 2, 2021
975acf3
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 3, 2021
a0c4d0a
trim diff
jbrockmendel Apr 3, 2021
c22f4c2
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 3, 2021
11a0591
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 4, 2021
2f6236b
trim diff
jbrockmendel Apr 4, 2021
94720e0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 5, 2021
28f241b
REF: implement EABackedBlock
jbrockmendel Apr 5, 2021
8457818
TST: test_delitem_series
jbrockmendel Apr 5, 2021
a406c11
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 5, 2021
45fe309
dont subclass ExtensionBlock
jbrockmendel Apr 5, 2021
db102ce
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 7, 2021
207f41a
pre-commit fixup
jbrockmendel Apr 7, 2021
410a1fe
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 7, 2021
70df636
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 7, 2021
2637cf4
revert no-longer-needed
jbrockmendel Apr 7, 2021
a494b9d
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 8, 2021
c22ddcb
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 9, 2021
e1f32a0
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 13, 2021
37b87a1
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 13, 2021
4d8bf6a
restore import
jbrockmendel Apr 13, 2021
4ef498e
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 14, 2021
e8d5ebd
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 16, 2021
874023e
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 20, 2021
40ca9d1
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 21, 2021
5334cda
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 22, 2021
2562b8f
revert removal of TDA check
jbrockmendel Apr 22, 2021
5e8d7f5
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 23, 2021
de912e3
Merge branch 'master' into ref-hybrid-3
jbrockmendel Apr 23, 2021
96f0323
remove extra get_values_for_json
jbrockmendel Apr 23, 2021
63af403
Merge branch 'master' into ref-hybrid-3
jbrockmendel May 4, 2021
8e54ed3
Merge branch 'master' into ref-hybrid-3
jbrockmendel May 9, 2021
eaf533a
Merge branch 'master' into ref-hybrid-3
jbrockmendel May 17, 2021
6fded81
whatsnew perf note
jbrockmendel May 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions asv_bench/benchmarks/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,42 @@ def time_unstack(self):
self.df.unstack(1)


class ReshapeExtensionDtype:

params = ["datetime64[ns, US/Pacific]", "Period[s]"]
param_names = ["dtype"]

def setup(self, dtype):
lev = pd.Index(list("ABCDEFGHIJ"))
ri = pd.Index(range(1000))
mi = MultiIndex.from_product([lev, ri], names=["foo", "bar"])

index = date_range("2016-01-01", periods=10000, freq="s", tz="US/Pacific")
if dtype == "Period[s]":
index = index.tz_localize(None).to_period("s")

ser = pd.Series(index, index=mi)
df = ser.unstack("bar")
# roundtrips -> df.stack().equals(ser)

self.ser = ser
self.df = df

def time_stack(self, dtype):
self.df.stack()

def time_unstack_fast(self, dtype):
# last level -> doesnt have to make copies
self.ser.unstack("bar")

def time_unstack_slow(self, dtype):
# first level -> must make copies
self.ser.unstack("foo")

def time_transpose(self, dtype):
self.df.T


class Unstack:

params = ["int", "category"]
Expand Down
17 changes: 16 additions & 1 deletion pandas/core/array_algos/take.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
from __future__ import annotations

from typing import Optional
from typing import (
TYPE_CHECKING,
Optional,
cast,
)

import numpy as np

Expand All @@ -14,11 +18,15 @@
from pandas.core.dtypes.common import (
ensure_int64,
ensure_platform_int,
is_strict_ea,
)
from pandas.core.dtypes.missing import na_value_for_dtype

from pandas.core.construction import ensure_wrapped_if_datetimelike

if TYPE_CHECKING:
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray


def take_nd(
arr: ArrayLike,
Expand Down Expand Up @@ -66,6 +74,13 @@ def take_nd(
if not isinstance(arr, np.ndarray):
# i.e. ExtensionArray,
# includes for EA to catch DatetimeArray, TimedeltaArray
if not is_strict_ea(arr):
# i.e. DatetimeArray, TimedeltaArray
arr = cast("NDArrayBackedExtensionArray", arr)
return arr.take(
indexer, axis=axis, fill_value=fill_value, allow_fill=allow_fill
)

return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)

arr = np.asarray(arr)
Expand Down
24 changes: 22 additions & 2 deletions pandas/core/arrays/_mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
cache_readonly,
doc,
)
from pandas.util._validators import validate_fillna_kwargs
from pandas.util._validators import (
validate_bool_kwarg,
validate_fillna_kwargs,
)

from pandas.core.dtypes.common import is_dtype_equal
from pandas.core.dtypes.missing import array_equivalent
Expand All @@ -35,6 +38,7 @@
from pandas.core.arrays.base import ExtensionArray
from pandas.core.construction import extract_array
from pandas.core.indexers import check_array_indexer
from pandas.core.sorting import nargminmax

NDArrayBackedExtensionArrayT = TypeVar(
"NDArrayBackedExtensionArrayT", bound="NDArrayBackedExtensionArray"
Expand Down Expand Up @@ -182,6 +186,22 @@ def equals(self, other) -> bool:
def _values_for_argsort(self):
return self._ndarray

# Signature of "argmin" incompatible with supertype "ExtensionArray"
def argmin(self, axis: int = 0, skipna: bool = True): # type:ignore[override]
# override base class by adding axis keyword
validate_bool_kwarg(skipna, "skipna")
if not skipna and self.isna().any():
raise NotImplementedError
return nargminmax(self, "argmin", axis=axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> nanargmin (is this not exercised in tests?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exercised in tests

The only difference between this and EA.argmin is this passes axis


# Signature of "argmax" incompatible with supertype "ExtensionArray"
def argmax(self, axis: int = 0, skipna: bool = True): # type:ignore[override]
# override base class by adding axis keyword
validate_bool_kwarg(skipna, "skipna")
if not skipna and self.isna().any():
raise NotImplementedError
return nargminmax(self, "argmax", axis=axis)

def copy(self: NDArrayBackedExtensionArrayT) -> NDArrayBackedExtensionArrayT:
new_data = self._ndarray.copy()
return self._from_backing_data(new_data)
Expand Down Expand Up @@ -278,7 +298,7 @@ def fillna(

if mask.any():
if method is not None:
func = missing.get_fill_func(method)
func = missing.get_fill_func(method, ndim=self.ndim)
new_values = func(self._ndarray.copy(), limit=limit, mask=mask)
# TODO: PandasArray didn't used to copy, need tests for this
new_values = self._from_backing_data(new_values)
Expand Down
25 changes: 25 additions & 0 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1495,6 +1495,31 @@ def is_extension_type(arr) -> bool:
return False


def is_strict_ea(obj):
"""
ExtensionArray that does not support 2D, or more specifically that does
not use HybridBlock.
"""
from pandas.core.arrays import (
DatetimeArray,
ExtensionArray,
TimedeltaArray,
)

return isinstance(obj, ExtensionArray) and not isinstance(
obj, (DatetimeArray, TimedeltaArray)
)


def is_ea_dtype(dtype) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give this a more descriptive name?

"""
Analogue to is_extension_array_dtype but excluding DatetimeTZDtype.
"""
# Note: if other EA dtypes are ever held in HybridBlock, exclude those
# here too.
return is_extension_array_dtype(dtype) and not is_datetime64tz_dtype(dtype)


def is_extension_array_dtype(arr_or_dtype) -> bool:
"""
Check if an object is a pandas extension array type.
Expand Down
18 changes: 5 additions & 13 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,15 @@ def is_nonempty(x) -> bool:
to_concat = non_empties

kinds = {obj.dtype.kind for obj in to_concat}
_contains_datetime = any(kind in ["m", "M"] for kind in kinds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure much point in making this private

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


all_empty = not len(non_empties)
single_dtype = len({x.dtype for x in to_concat}) == 1
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat)

if _contains_datetime:
return _concat_datetime(to_concat, axis=axis)

if any_ea:
# we ignore axis here, as internally concatting with EAs is always
# for axis=0
Expand All @@ -124,10 +128,7 @@ def is_nonempty(x) -> bool:
cls = type(to_concat[0])
return cls._concat_same_type(to_concat)
else:
return np.concatenate(to_concat)

elif any(kind in ["m", "M"] for kind in kinds):
return _concat_datetime(to_concat, axis=axis)
return np.concatenate(to_concat, axis=axis)

elif all_empty:
# we have all empties, but may need to coerce the result dtype to
Expand Down Expand Up @@ -344,14 +345,5 @@ def _concat_datetime(to_concat, axis=0):
# in Timestamp/Timedelta
return _concatenate_2d([x.astype(object) for x in to_concat], axis=axis)

if axis == 1:
# TODO(EA2D): kludge not necessary with 2D EAs
to_concat = [x.reshape(1, -1) if x.ndim == 1 else x for x in to_concat]

result = type(to_concat[0])._concat_same_type(to_concat, axis=axis)

if result.ndim == 2 and is_extension_array_dtype(result.dtype):
# TODO(EA2D): kludge not necessary with 2D EAs
assert result.shape[0] == 1
result = result[0]
return result
45 changes: 43 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
is_datetime64_any_dtype,
is_dict_like,
is_dtype_equal,
is_ea_dtype,
is_extension_array_dtype,
is_float,
is_float_dtype,
Expand All @@ -121,6 +122,7 @@
is_object_dtype,
is_scalar,
is_sequence,
is_strict_ea,
pandas_dtype,
)
from pandas.core.dtypes.missing import (
Expand Down Expand Up @@ -756,7 +758,28 @@ def _can_fast_transpose(self) -> bool:
if len(blocks) != 1:
return False

return not self._mgr.any_extension_types
dtype = blocks[0].dtype
# TODO(EA2D) special case would be unnecessary with 2D EAs
return not is_ea_dtype(dtype)

@property
jreback marked this conversation as resolved.
Show resolved Hide resolved
def _values_compat(self) -> ArrayLike:
"""
Analogue to ._values that may return a 2D ExtensionArray.
"""
mgr = self._mgr
if isinstance(mgr, ArrayManager):
return self._values

blocks = mgr.blocks
if len(blocks) != 1:
return self._values

arr = blocks[0].values
if arr.ndim == 1:
# non-2D ExtensionArray
return self._values
return arr.T

# ----------------------------------------------------------------------
# Rendering Methods
Expand Down Expand Up @@ -3164,16 +3187,32 @@ def transpose(self, *args, copy: bool = False) -> DataFrame:
# construct the args

dtypes = list(self.dtypes)
if self._is_homogeneous_type and dtypes and is_extension_array_dtype(dtypes[0]):

if self._can_fast_transpose:
# Note: tests pass without this, but this improves perf quite a bit.
# error: "ArrayLike" has no attribute "T"
new_values = self._values_compat.T # type:ignore[attr-defined]
if copy:
new_values = new_values.copy()

result = self._constructor(
new_values, index=self.columns, columns=self.index
)

elif (
self._is_homogeneous_type and dtypes and is_extension_array_dtype(dtypes[0])
):
# We have EAs with the same dtype. We can preserve that dtype in transpose.
dtype = dtypes[0]

arr_type = dtype.construct_array_type()
values = self.values

new_values = [arr_type._from_sequence(row, dtype=dtype) for row in values]
result = self._constructor(
dict(zip(self.index, new_values)), index=self.columns
)
# TODO: what if index is non-unique? (not specific to EA2D)

else:
new_values = self.values.T
Expand Down Expand Up @@ -9185,6 +9224,8 @@ def func(values: np.ndarray):

def blk_func(values, axis=1):
if isinstance(values, ExtensionArray):
if not is_strict_ea(values):
return values._reduce(name, axis=1, skipna=skipna, **kwds)
return values._reduce(name, skipna=skipna, **kwds)
else:
return op(values, axis=axis, skipna=skipna, **kwds)
Expand Down
8 changes: 3 additions & 5 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
is_interval_dtype,
is_numeric_dtype,
is_scalar,
is_strict_ea,
needs_i8_conversion,
)
from pandas.core.dtypes.missing import (
Expand All @@ -81,10 +82,7 @@
validate_func_kwargs,
)
from pandas.core.apply import GroupByApply
from pandas.core.arrays import (
Categorical,
ExtensionArray,
)
from pandas.core.arrays import Categorical
from pandas.core.base import (
DataError,
SpecificationError,
Expand Down Expand Up @@ -1128,7 +1126,7 @@ def py_fallback(values: ArrayLike) -> ArrayLike:
obj: FrameOrSeriesUnion

# call our grouper again with only this block
if isinstance(values, ExtensionArray) or values.ndim == 1:
if is_strict_ea(values) or values.ndim == 1:
# TODO(EA2D): special case not needed with 2D EAs
obj = Series(values)
else:
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
)

import pandas.core.algorithms as algorithms
from pandas.core.arrays import ExtensionArray
from pandas.core.base import SelectionMixin
import pandas.core.common as com
from pandas.core.frame import DataFrame
Expand Down Expand Up @@ -209,7 +210,9 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0):
result_values = None

sdata: FrameOrSeries = splitter._get_sorted_data()
if sdata.ndim == 2 and np.any(sdata.dtypes.apply(is_extension_array_dtype)):
if sdata.ndim == 2 and any(
isinstance(x, ExtensionArray) for x in sdata._iter_column_arrays()
):
# calling splitter.fast_apply will raise TypeError via apply_frame_axis0
# if we pass EA instead of ndarray
# TODO: can we have a workaround for EAs backed by ndarray?
Expand Down
Loading