-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: DataFrame.agg has no partial failure #40211
Comments
Agreed.
This becomes a giant PITA bc we have a bunch of places where we pin the 'name' attribute to an object before passing it to a UDF. I'm not aware of any documentation for this. But it seems like the kind of thing that would mess with your examples. |
Reopening because I missed the behavior difference when raising a TypeError instead of a ValueError (see below). We deprecated any partial-failure in #40238 for Series/DataFrame.transform, but we could have instead only deprecated when the exception raised is not a TypeError. While I personally prefer not allowing any partial failure (i.e. as a user, I think it is my responsibility to select the columns that I want an operation to work on), I think we should partially revert #40238 so that only TypeErrors can give rise to partial failure in Series/DataFrame.transform (any other exception type would instead raise in a future version). This would then be consistent with most of the groupby behavior. If we want to take it a step further and also not allow TypeErrors, that can be done later. Summary with TypeError (code below): DataFrame.agg with callable: raised TypeError() Note: Code:
|
Synthetic example:
The transform call results in
whereas the agg call fails outright due to the
raise ValueError
. Series/DataFrameapply
just callsagg
. I also tried but couldn't find an example similar to transform's behavior of partial failure usingDataFrame.groupby
with apply/agg/transform, but wasn't able to (the code paths are a bit complex here, so I've resorted to blackbox testing).My thinking here is having
transform
fail outright in the example above is the good way to go. It would be simpler from a code perspective and avoids silent failure, although perhaps it would make a user drop nuisance columns. I'll also mention that one of the things I'd like to work toward is having e.g.have the same performance as
for which having transform allow partial failure like this means there would need to be a fallback.
cc @jorisvandenbossche @jreback @jbrockmendel
The text was updated successfully, but these errors were encountered: