Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of new df.agg, df.transform and df.apply is very inconsistent #18103

Open
memeplex opened this issue Nov 4, 2017 · 3 comments
Open
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby

Comments

@memeplex
Copy link

memeplex commented Nov 4, 2017

This looks really messy, it's not difficult to find out inconsistent behavior in all three methods. It seems to me that the dataframe implementation is too liberally blurring the distinction between agg, transform and apply.

In: df
Out: 
   x  y  z
0  1  1  4
1  1  2  5
2  2  3  6
3  2  1  4
4  3  2  5
5  3  3  6

In: gb = df.groupby('x')
  1. dataframe.apply accepts lists and dicts while groupby.apply doesn't:
In: df.apply(['sum', 'mean'])
Out: 
         x     y     z
sum   12.0  12.0  30.0
mean   2.0   2.0   5.0

In: gb.apply(['sum', 'mean'])
TypeError    
  1. dataframe.transform disallows aggregations while groupby.transforms broadcasts them to the original shape:
In: df.transform('sum')
ValueError: transforms cannot produce aggregated results

In: gb.transform('sum')
Out: 
   y   z
0  3   9
1  3   9
2  4  10
3  4  10
4  5  11
5  5  11
  1. dataframe.agg allows non-aggregations while groupby.agg doesn't:
In: df.agg(lambda x: x)
Out: 
   x  y  z
0  1  1  4
1  1  2  5
2  2  3  6
3  2  1  4
4  3  2  5
5  3  3  6

In: gb.agg(lambda x: x)
ValueError: cannot copy sequence with size 3 to array axis with dimension 2
@gfyoung
Copy link
Member

gfyoung commented Nov 4, 2017

@memeplex : Thanks for reporting this! Just for reference, did you notice this "inconsistent" behavior in previous versions of pandas, or has this surfaced just now (you use the word "new" in the title)?

@gfyoung gfyoung added Apply Apply, Aggregate, Transform, Map Groupby labels Nov 4, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 4, 2017

Regarding df.apply, we don't actually say that we accept lists of functions, so that behavior is somewhat accidental. Nevertheless, I don't see why we couldn't implement for groupby's?

Regarding df.transform, your groupby transform call should have worked according to documentation, so that looks like a bug to me.

Regarding df.agg, that looks like a bug to me as well. While lambda x: x is vacuously an aggregation, it should work for groupby as well AFAICT.

@gfyoung gfyoung added the Bug label Nov 4, 2017
@memeplex
Copy link
Author

memeplex commented Nov 4, 2017

I have tested it with 0.21 only but transform and agg were acquired by df in 0.20 so the behavior can't be older than that. It's not surprising given the fact that the current implementation mostly delegates everything to _aggregate.

I'm not sure allowing apply to receive a full mapper as part of its "public" signature is a good idea since the difference between the three methods is becoming almost impossible to describe (e.g. I don't see how the "more flexible" role the traditional groupby documentation assigns to apply is relevant anymore for dfs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
Development

No branches or pull requests

2 participants