-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: preserve EA dtype in transpose #30091
Changes from all commits
a2217e0
4fb44c5
e18a426
9be80b7
52d3b9c
bfdfccf
10d81bd
132472d
6aae8d6
5448bbb
9d703a2
8b8a464
79550bd
feecee8
dada1a6
1e58c4e
9abd6c2
f6b3c37
4675de6
6d9daa8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2485,7 +2485,7 @@ def memory_usage(self, index=True, deep=False): | |
) | ||
return result | ||
|
||
def transpose(self, *args, **kwargs): | ||
def transpose(self, *args, copy: bool = False): | ||
""" | ||
Transpose index and columns. | ||
|
||
|
@@ -2495,9 +2495,14 @@ def transpose(self, *args, **kwargs): | |
|
||
Parameters | ||
---------- | ||
*args, **kwargs | ||
Additional arguments and keywords have no effect but might be | ||
accepted for compatibility with numpy. | ||
*args : tuple, optional | ||
Accepted for compatibility with NumPy. | ||
copy : bool, default False | ||
Whether to copy the data after transposing, even for DataFrames | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
with a single dtype. | ||
|
||
Note that a copy is always required for mixed dtype DataFrames, | ||
or for DataFrames with any extension types. | ||
|
||
Returns | ||
------- | ||
|
@@ -2578,7 +2583,29 @@ def transpose(self, *args, **kwargs): | |
dtype: object | ||
""" | ||
nv.validate_transpose(args, dict()) | ||
return super().transpose(1, 0, **kwargs) | ||
# construct the args | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't you think this is better located in pandas/core/reshape ? (and called as a helper function here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a slight preference for keeping it here just for readability. The reshape part is essentially just a list comprehension. values = self.values
new_values = [arr_type._from_sequence(row, dtype=dtype) for row in values] which I don't think warrants its own function. I don't see anything places in But happy to move it if you want / if you see other places that could use parts of this. |
||
|
||
dtypes = list(self.dtypes) | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if self._is_homogeneous_type and dtypes and is_extension_array_dtype(dtypes[0]): | ||
# We have EAs with the same dtype. We can preserve that dtype in transpose. | ||
dtype = dtypes[0] | ||
arr_type = dtype.construct_array_type() | ||
values = self.values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC the single-column case could be done without this casting. think its worth special-casing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it can be done without casting in general. We'll still need to reshape the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im pretty sure i did this in the other PR, something like:
(of course, if we had 2D EAs...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, we can avoid casting to an ndarray by making My prior expectation is that converting to an ndarray and doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Number of columns. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, i was specifically referring to single-column There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm OK with not-doing this optimization here, just want to make sure we're on the same page about what the available optimization is |
||
|
||
new_values = [arr_type._from_sequence(row, dtype=dtype) for row in values] | ||
result = self._constructor( | ||
dict(zip(self.index, new_values)), index=self.columns | ||
) | ||
|
||
else: | ||
new_values = self.values.T | ||
if copy: | ||
new_values = new_values.copy() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if non-homogeneous, then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we 100% sure about that? Or are there types distinct dtypes that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if non-homogeneous then we have multiple blocks, so multiple ndarrays that are going through |
||
result = self._constructor( | ||
new_values, index=self.columns, columns=self.index | ||
) | ||
|
||
return result.__finalize__(self) | ||
|
||
T = property(transpose) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -643,50 +643,6 @@ def _set_axis(self, axis, labels): | |
self._data.set_axis(axis, labels) | ||
self._clear_item_cache() | ||
|
||
def transpose(self, *args, **kwargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does Series have transpose for compat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, via IndexOpsMixin. |
||
""" | ||
Permute the dimensions of the %(klass)s | ||
|
||
Parameters | ||
---------- | ||
args : %(args_transpose)s | ||
copy : bool, default False | ||
Make a copy of the underlying data. Mixed-dtype data will | ||
always result in a copy | ||
**kwargs | ||
Additional keyword arguments will be passed to the function. | ||
|
||
Returns | ||
------- | ||
y : same as input | ||
|
||
Examples | ||
-------- | ||
>>> p.transpose(2, 0, 1) | ||
>>> p.transpose(2, 0, 1, copy=True) | ||
""" | ||
|
||
# construct the args | ||
axes, kwargs = self._construct_axes_from_arguments( | ||
args, kwargs, require_all=True | ||
) | ||
axes_names = tuple(self._get_axis_name(axes[a]) for a in self._AXIS_ORDERS) | ||
axes_numbers = tuple(self._get_axis_number(axes[a]) for a in self._AXIS_ORDERS) | ||
|
||
# we must have unique axes | ||
if len(axes) != len(set(axes)): | ||
raise ValueError(f"Must specify {self._AXIS_LEN} unique axes") | ||
|
||
new_axes = self._construct_axes_dict_from( | ||
self, [self._get_axis(x) for x in axes_names] | ||
) | ||
new_values = self.values.transpose(axes_numbers) | ||
if kwargs.pop("copy", None) or (len(args) and args[-1]): | ||
new_values = new_values.copy() | ||
|
||
nv.validate_transpose(tuple(), kwargs) | ||
return self._constructor(new_values, **new_axes).__finalize__(self) | ||
|
||
def swapaxes(self, axis1, axis2, copy=True): | ||
""" | ||
Interchange axes and swap values axes appropriately. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -753,18 +753,18 @@ def test_pi_sub_isub_offset(self): | |
rng -= pd.offsets.MonthEnd(5) | ||
tm.assert_index_equal(rng, expected) | ||
|
||
def test_pi_add_offset_n_gt1(self, box_transpose_fail): | ||
@pytest.mark.parametrize("transpose", [True, False]) | ||
def test_pi_add_offset_n_gt1(self, box_with_array, transpose): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not a blocker, but transpose param is sub-optimal. For DataFrame case it will correctly test both cases, but for EA/Index/Series it will mean duplicate tests. I'll try to come up with something nicer in an upcoming arithmetic-test-specific pass |
||
# GH#23215 | ||
# add offset to PeriodIndex with freq.n > 1 | ||
box, transpose = box_transpose_fail | ||
|
||
per = pd.Period("2016-01", freq="2M") | ||
pi = pd.PeriodIndex([per]) | ||
|
||
expected = pd.PeriodIndex(["2016-03"], freq="2M") | ||
|
||
pi = tm.box_expected(pi, box, transpose=transpose) | ||
expected = tm.box_expected(expected, box, transpose=transpose) | ||
pi = tm.box_expected(pi, box_with_array, transpose=transpose) | ||
expected = tm.box_expected(expected, box_with_array, transpose=transpose) | ||
|
||
result = pi + per.freq | ||
tm.assert_equal(result, expected) | ||
|
@@ -982,16 +982,15 @@ def test_pi_add_sub_timedeltalike_freq_mismatch_monthly(self, mismatched_freq): | |
with pytest.raises(IncompatibleFrequency, match=msg): | ||
rng -= other | ||
|
||
def test_parr_add_sub_td64_nat(self, box_transpose_fail): | ||
@pytest.mark.parametrize("transpose", [True, False]) | ||
def test_parr_add_sub_td64_nat(self, box_with_array, transpose): | ||
# GH#23320 special handling for timedelta64("NaT") | ||
box, transpose = box_transpose_fail | ||
|
||
pi = pd.period_range("1994-04-01", periods=9, freq="19D") | ||
other = np.timedelta64("NaT") | ||
expected = pd.PeriodIndex(["NaT"] * 9, freq="19D") | ||
|
||
obj = tm.box_expected(pi, box, transpose=transpose) | ||
expected = tm.box_expected(expected, box, transpose=transpose) | ||
obj = tm.box_expected(pi, box_with_array, transpose=transpose) | ||
expected = tm.box_expected(expected, box_with_array, transpose=transpose) | ||
|
||
result = obj + other | ||
tm.assert_equal(result, expected) | ||
|
@@ -1009,16 +1008,12 @@ def test_parr_add_sub_td64_nat(self, box_transpose_fail): | |
TimedeltaArray._from_sequence(["NaT"] * 9), | ||
], | ||
) | ||
def test_parr_add_sub_tdt64_nat_array(self, box_df_fail, other): | ||
# FIXME: DataFrame fails because when when operating column-wise | ||
# timedelta64 entries become NaT and are treated like datetimes | ||
box = box_df_fail | ||
|
||
def test_parr_add_sub_tdt64_nat_array(self, box_with_array, other): | ||
pi = pd.period_range("1994-04-01", periods=9, freq="19D") | ||
expected = pd.PeriodIndex(["NaT"] * 9, freq="19D") | ||
|
||
obj = tm.box_expected(pi, box) | ||
expected = tm.box_expected(expected, box) | ||
obj = tm.box_expected(pi, box_with_array) | ||
expected = tm.box_expected(expected, box_with_array) | ||
|
||
result = obj + other | ||
tm.assert_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we still need kwargs for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, no. Neither
np.transpose
norndarray.transpose
take additional keyword arguments.