Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy shifting performance issue #2162

Closed
wesm opened this issue Nov 2, 2012 · 2 comments
Closed

GroupBy shifting performance issue #2162

wesm opened this issue Nov 2, 2012 · 2 comments
Labels
Groupby Performance Memory or execution speed performance
Milestone

Comments

@wesm
Copy link
Member

wesm commented Nov 2, 2012

ids = np.arange(48000)
lens = np.maximum(np.round(15+9.5*np.random.randn(48000)), 1.0).astype(int)
id_vec = np.repeat(ids, lens)
lens_shift = np.concatenate(([0], lens[:-1]))
mon_vec = np.arange(lens.sum()) - np.repeat(np.cumsum(lens_shift), lens)
n = len(mon_vec)
df = pd.DataFrame.from_items([('pool', id_vec), ('month', mon_vec)] + [(c, np.random.rand(n)) for c in 'abcde'])
df = df.set_index(['pool', 'month'])
%time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))

xref http://stackoverflow.com/questions/13180499/most-efficient-way-to-shift-multiindex-time-series

wesm added a commit that referenced this issue Nov 2, 2012
@jreback
Copy link
Contributor

jreback commented Mar 27, 2013

This obviously affected by issues fixed in #3145,
still prob should add to the vbenchs, and see what we can do

In [11]: %time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
CPU times: user 10.13 s, sys: 0.18 s, total: 10.31 s
Wall time: 10.35 s

In [12]: pd.__version__
Out[12]: '0.11.0.dev-e6140e9'
In [11]: %time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
CPU times: user 51.07 s, sys: 0.25 s, total: 51.32 s
Wall time: 51.48 s

In [13]: pd.__version__
Out[13]: '0.10.1'
  Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    96005    1.092    0.000    2.360    0.000 common.py:456(take_nd)
    48000    0.560    0.000    3.790    0.000 index.py:1536(values)
    96006    0.486    0.000    0.895    0.000 internals.py:762(make_block)
   144007    0.474    0.000    0.764    0.000 common.py:690(_maybe_promote)
    96006    0.450    0.000    1.504    0.000 internals.py:818(__init__)
    48000    0.445    0.000    4.000    0.000 frame.py:3975(shift)
    48000    0.426    0.000    0.958    0.000 index.py:1817(__getitem__)

prob should have a shift-like operator built into the apply/transform, rather than a generic apply,
would obviously be much faster

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

fixed as indicated above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants