Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AttributeError when groupby as_index=False on empty DataFrame #35324

Merged
merged 3 commits into from
Jul 17, 2020

Conversation

salem3358
Copy link
Contributor

@salem3358 salem3358 commented Jul 17, 2020

@simonjayhawkins simonjayhawkins added Groupby Regression Functionality that used to work in a prior pandas version labels Jul 17, 2020
# select everything except for the last level, which is the one
# containing the name of the function(s), see GH 32040
result.columns = result.columns.rename(
[self._selected_obj.columns.name] * result.columns.nlevels
).droplevel(-1)

except ValueError as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you leave this unchanged, does it work to put another try/except in the else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I was trying to avoid that because nesting exception handling increase complexity.
  • But you are right. There is a chance that the self._aggregate_multiple_funcs also raise AttributeError and this will change the behavior in that case.
  • Will update the code as your suggestion. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that nesting exception handling is not ideal, maybe an isinstance check for a Series instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between nesting exception handling and explicit type-checking (also nesting inside another exception handling), I think the first one is less evil :). what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a static typing perspective, the isinstance is probably better. In this case, an AttributeError should be easily avoided. hasattr is not ideal from a static typing/mypy perspective either. However, leave it as a try/except for now and see what others think.

df = DataFrame(columns=["A", "B", "C"])
left = df.groupby(by="A", as_index=False)["B"].sum()
assert type(left) is DataFrame
assert left.to_dict() == {"A": {}, "B": {}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this test to be more like (and closer to the issue op)

df = pd.DataFrame(columns=["a", "b"])
grp = df.groupby(by="a", as_index=False)["b"]
result = grp.sum()

expected = pd.DataFrame(index=pd.Int64Index([], dtype="int64"), columns=["a", "b"])
expected["b"] = expected["b"].astype("float64")

tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great. I was struggling with IndexType

@simonjayhawkins
Copy link
Member

Thanks @salem3358 for the PR. Can you add a release note to 1.1.0. even if this doesn't make the RC, since this fixes a regression we will be backporting if necessary. cc @TomAugspurger

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Jul 17, 2020
@TomAugspurger
Copy link
Contributor

Added a releases note. Merging on green.

@TomAugspurger TomAugspurger mentioned this pull request Jul 17, 2020
@TomAugspurger TomAugspurger merged commit ddc6442 into pandas-dev:master Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AttributeError when doing groupby with as_index=False on Empty DataFrame
3 participants