Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: GH2121 groupby transform #3145

Merged
merged 2 commits into from
Mar 25, 2013
Merged

PERF: GH2121 groupby transform #3145

merged 2 commits into from
Mar 25, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 23, 2013

closes #2121

Two items were causing slowness

  1. using apply for each group (which in this case is equivalent to directly calling
    the function on the group). the function that is testing is fillna which we have defined as a function of a data frame so its ok here to use the direct function call

I create a slow_path/fast_path with the first group determining the path, not sure this is super robust, but it is a significant source of slowness

  1. at the end of the groupby the concetated object has a reindex_like, this is way slow, replacing by sort_index is much faster (this is a multi-index), again not of the robustness, but all tests pass

This is a comparision of the bench/bench_transform.py (supplied in #2121)

The apply_by_group DOES include the sort_index (which is necessary for correctness)

In [2]: %timeit apply_by_group(grouped, f_fillna)
1 loops, best of 3: 2.11 s per loop

In [3]: %timeit grouped.transform(f_fillna)
1 loops, best of 3: 2.14 s per loop

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

@y-p had a brain freeze

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

@y-p bigger issue is correctness.....I don't have a good intuitive feel for the groupby reindexing/sorting at the end....all the tests pass..but.....

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

@y-p I made the wrapper change, but didn't really impact resuls, not sure if there are actually a lot of calls in this case (or maybe optimized away somewhere)

@ghost
Copy link

ghost commented Mar 23, 2013

I don't think large group counts are the common case. probably worth a vbench though

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

@y-p in this bench there were 2000 groups, not a lot I guess

@ghost
Copy link

ghost commented Mar 23, 2013

yep.

Actually, for 2x on a corner case, I'd peel off even the original lambda.

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

@y-p unfortunately cannot peel off the 2nd level lamba because its needed in the apply (only in the slow path though)...fast path fixed up...good news is that the 2nd level lambda is sometimes optimized away in any event (in apply)

@wesm
Copy link
Member

wesm commented Mar 23, 2013

Could you add a simple vbench (preferable that takes 300ms or less per iteration) that can be used to track the performance of this?

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

done

-----------------------------------------------------------
Test name                 | target[ms] | base[ms] |   ratio
-----------------------------------------------------------

groupby_transform             494.3950  1367.9831     0.3614

Target [89a0c5d] : PERF: added vb_suite test for groupby_transform 
Base   [05a737d] : TST: skip problematic xlrd test (essentially 0.10.1)

@jreback
Copy link
Contributor Author

jreback commented Mar 23, 2013

should I take out the bench/bench_transform.py ?

jreback added a commit that referenced this pull request Mar 25, 2013
PERF: GH2121 groupby transform
@jreback jreback merged commit 1b7f070 into pandas-dev:master Mar 25, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupBy transform() is surprisingly slow
2 participants