-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AttributeError when groupby as_index=False on empty DataFrame #35324
Conversation
# select everything except for the last level, which is the one | ||
# containing the name of the function(s), see GH 32040 | ||
result.columns = result.columns.rename( | ||
[self._selected_obj.columns.name] * result.columns.nlevels | ||
).droplevel(-1) | ||
|
||
except ValueError as err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you leave this unchanged, does it work to put another try/except in the else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I was trying to avoid that because nesting exception handling increase complexity.
- But you are right. There is a chance that the
self._aggregate_multiple_funcs
also raiseAttributeError
and this will change the behavior in that case. - Will update the code as your suggestion. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that nesting exception handling is not ideal, maybe an isinstance check for a Series instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between nesting exception handling and explicit type-checking (also nesting inside another exception handling), I think the first one is less evil :). what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a static typing perspective, the isinstance is probably better. In this case, an AttributeError should be easily avoided. hasattr is not ideal from a static typing/mypy perspective either. However, leave it as a try/except for now and see what others think.
df = DataFrame(columns=["A", "B", "C"]) | ||
left = df.groupby(by="A", as_index=False)["B"].sum() | ||
assert type(left) is DataFrame | ||
assert left.to_dict() == {"A": {}, "B": {}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change this test to be more like (and closer to the issue op)
df = pd.DataFrame(columns=["a", "b"])
grp = df.groupby(by="a", as_index=False)["b"]
result = grp.sum()
expected = pd.DataFrame(index=pd.Int64Index([], dtype="int64"), columns=["a", "b"])
expected["b"] = expected["b"].astype("float64")
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great. I was struggling with IndexType
Thanks @salem3358 for the PR. Can you add a release note to 1.1.0. even if this doesn't make the RC, since this fixes a regression we will be backporting if necessary. cc @TomAugspurger |
Added a releases note. Merging on green. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff