GroupBy shifting performance issue #2162

wesm · 2012-11-02T14:49:11Z

ids = np.arange(48000)
lens = np.maximum(np.round(15+9.5*np.random.randn(48000)), 1.0).astype(int)
id_vec = np.repeat(ids, lens)
lens_shift = np.concatenate(([0], lens[:-1]))
mon_vec = np.arange(lens.sum()) - np.repeat(np.cumsum(lens_shift), lens)
n = len(mon_vec)
df = pd.DataFrame.from_items([('pool', id_vec), ('month', mon_vec)] + [(c, np.random.rand(n)) for c in 'abcde'])
df = df.set_index(['pool', 'month'])
%time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))

xref http://stackoverflow.com/questions/13180499/most-efficient-way-to-shift-multiindex-time-series

The text was updated successfully, but these errors were encountered:

… using take. related to #2162

jreback · 2013-03-27T11:23:28Z

This obviously affected by issues fixed in #3145,
still prob should add to the vbenchs, and see what we can do

In [11]: %time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
CPU times: user 10.13 s, sys: 0.18 s, total: 10.31 s
Wall time: 10.35 s

In [12]: pd.__version__
Out[12]: '0.11.0.dev-e6140e9'

In [11]: %time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
CPU times: user 51.07 s, sys: 0.25 s, total: 51.32 s
Wall time: 51.48 s

In [13]: pd.__version__
Out[13]: '0.10.1'

  Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    96005    1.092    0.000    2.360    0.000 common.py:456(take_nd)
    48000    0.560    0.000    3.790    0.000 index.py:1536(values)
    96006    0.486    0.000    0.895    0.000 internals.py:762(make_block)
   144007    0.474    0.000    0.764    0.000 common.py:690(_maybe_promote)
    96006    0.450    0.000    1.504    0.000 internals.py:818(__init__)
    48000    0.445    0.000    4.000    0.000 frame.py:3975(shift)
    48000    0.426    0.000    0.958    0.000 index.py:1817(__getitem__)

prob should have a shift-like operator built into the apply/transform, rather than a generic apply,
would obviously be much faster

jreback · 2013-09-21T17:10:39Z

fixed as indicated above

wesm added a commit that referenced this issue Nov 2, 2012

ENH: revert Index mutability change. improve performance of dropna by…

669c606

… using take. related to #2162

jreback closed this as completed Sep 21, 2013

mroeschke mentioned this issue Jun 25, 2018

pct change bug issue 21200 #21235

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupBy shifting performance issue #2162

GroupBy shifting performance issue #2162

wesm commented Nov 2, 2012

jreback commented Mar 27, 2013

jreback commented Sep 21, 2013

GroupBy shifting performance issue #2162

GroupBy shifting performance issue #2162

Comments

wesm commented Nov 2, 2012

jreback commented Mar 27, 2013

jreback commented Sep 21, 2013