Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: retain extension dtypes in transpose #28048

Closed
wants to merge 8 commits into from

Conversation

jbrockmendel
Copy link
Member

I'll have to look through the issues to see what this closes.

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty edge-cases, but can you check the output of

In [15]: df = pd.DataFrame({"A": pd.Categorical([1, 2]), "B": pd.Categorical([2, 1])})

In [16]: df.dtypes
Out[16]:
A    category
B    category
dtype: object

especially check the values? Those are considered the same dtype since the categories are unordered.

pandas/core/generic.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

but can you check the output of

It looks like df.dtypes[0] == df.dtypes[1] == df.T.dtypes[0] == df.T.dtypes[1]. Is this the desired behavior?

@TomAugspurger
Copy link
Contributor

I believe so.

@TomAugspurger TomAugspurger added this to the 1.0 milestone Aug 30, 2019
@TomAugspurger
Copy link
Contributor

Could you add a test case for unordered categories with differing orders? #28048 (review)

@jbrockmendel
Copy link
Member Author

will do

# TODO: this can be made cleaner if/when (N, 1) EA are allowed
dtype = self.dtypes.iloc[0]
for col in result.columns:
result[col] = result[col].astype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern is very unusual, we normally don't use repeated setting like this. rather we create an array / dict in a comprehension then construct. for EA this might be ok (as we have a single EA per block), but would like not to use this patter anywhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also override transpose in frame.py rather than do it here (and maybe just move the current one to Series) as they don't really share code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is sub-optimal. Ideally this will be made unnecessary by 2D EAs before too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does result[col] = result[col].astype(dtype) not work? I don't recall what we do there when there are multiple blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand the question; result[col] = result[col].astype(dtype) is exactly what this PR does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I meant result = result.astype(dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. updated

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 2, 2019
@jreback
Copy link
Contributor

jreback commented Sep 2, 2019

also see if you can find any issues this solves (e.g. likely Datetime w/timezone transpose we have a couple of issues IIRC) & needs a whatsnew note.

@jbrockmendel
Copy link
Member Author

Closing to clear the queue. The Right Way to do this will be to allow 2D EA.

@jbrockmendel
Copy link
Member Author

reopened and rebase

@jbrockmendel jbrockmendel reopened this Nov 28, 2019
return self._constructor(new_values, **new_axes).__finalize__(self)
result = self._constructor(new_values, **new_axes).__finalize__(self)

if self.ndim == 2 and self._is_homogeneous_type and len(self.columns):
Copy link
Contributor

@jreback jreback Dec 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worthwhile to make a method to encapsulate this maybe

def _homogeneous_dtype(self):
   # return the single dtype if homogeneous, None if not

@jbrockmendel
Copy link
Member Author

closing in favor of #30091.

@jbrockmendel jbrockmendel deleted the Tdtype branch April 5, 2020 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants