-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: GH2121 groupby transform #3145
Conversation
@y-p had a brain freeze |
@y-p bigger issue is correctness.....I don't have a good intuitive feel for the groupby reindexing/sorting at the end....all the tests pass..but..... |
@y-p I made the wrapper change, but didn't really impact resuls, not sure if there are actually a lot of calls in this case (or maybe optimized away somewhere) |
I don't think large group counts are the common case. probably worth a vbench though |
@y-p in this bench there were 2000 groups, not a lot I guess |
yep. Actually, for 2x on a corner case, I'd peel off even the original lambda. |
@y-p unfortunately cannot peel off the 2nd level lamba because its needed in the apply (only in the slow path though)...fast path fixed up...good news is that the 2nd level lambda is sometimes optimized away in any event (in apply) |
Could you add a simple vbench (preferable that takes 300ms or less per iteration) that can be used to track the performance of this? |
done
|
should I take out the bench/bench_transform.py ? |
added to RELEASE.rst, issue GH2121
PERF: GH2121 groupby transform
closes #2121
Two items were causing slowness
the function on the group). the function that is testing is
fillna
which we have defined as a function of a data frame so its ok here to use the direct function callI create a slow_path/fast_path with the first group determining the path, not sure this is super robust, but it is a significant source of slowness
This is a comparision of the bench/bench_transform.py (supplied in #2121)
The apply_by_group DOES include the sort_index (which is necessary for correctness)