Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Mixed DataFrame with Extension Array incorrect aggregation #35112

Closed

Conversation

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 3, 2020
@jorisvandenbossche
Copy link
Member

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jul 6, 2020
try:
result = f(values)

except TypeError:
# e.g. in nanops trying to convert strs to float

# try by-column first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be moved? if it is moved, then the "try by-column first" comment is no longer accurate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "try by-column first" comment is no longer accurate

This comment was there originally before the move in https://github.com/pandas-dev/pandas/pull/32950/files.

why does this need to be moved?

This PR reverts a change that caused a regression. The PR that caused the regression is labelled as a clean. This PR is in response to #34730 (comment).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, here, @jbrockmendel ok with this? (certainly can follow up with a cleanup PR)

@jreback
Copy link
Contributor

jreback commented Jul 9, 2020

ping @jbrockmendel

def test_mixed_frame_with_integer_sum():
# https://github.com/pandas-dev/pandas/issues/34520
df = pd.DataFrame([["a", 1]], columns=list("ab"))
df.astype({"b": "Int64"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be df = df.astype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jbrockmendel
Copy link
Member

I think there are better fixes available. The underlying problem is that we're calling nanops.nansum on an IntegerArray and that raises ValueError. So we should either make nanops.nansum handle IntegerArray or change the definition of f to dispatch to EA._reduce cc @jorisvandenbossche has been working on the latter approach

@simonjayhawkins
Copy link
Member Author

closing in favour of #32867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Mixed DataFrame with Extension Array incorrect aggregation
4 participants