ENH: Consolidation and further optimization of take functions in common #2867

stephenwlin · 2013-02-13T17:36:19Z

I've consolidated take_nd, ndtake, and take_fast in common into a single signature take_nd which has cleaner semantics (which is documented in a docstring), at least preserves the existing performance properties in all cases, and improves it (sometimes significantly) in some. (In particular, computation of intermediate arrays like boolean masks are still being short-circuited in the same way they were before, in the appropriate situations.) The operation that used to be take_fast was also broken for non-NA fill_value: this is fixed too.

In addition, I've optimized the Cython implementations of 2-D takes to use row-wise or column-wise ~~memcpys~~ memmoves automatically in places where it is appropriate (same type input and output, non-object type, both c-contiguous if slice axis is 0, both f-contiguous if slice axis is 1...). In theory Cython and/or the C compiler could do this automatically, but apparently it doesn't because the performance does improve when this is triggered, at least with gcc on 32-bit linux.

Tests with measurably improved performance (this is on top of the performance improvements in #2819)

Results:
                                            t_head t_baseline      ratio
name
frame_reindex_axis0                         0.6845     3.2839     0.2084
frame_reindex_axis1                         4.0088     5.4203     0.7396
frame_boolean_row_select                    0.2932     0.3939     0.7443

All other ratios are apparently just statistical noise within the +- 10% range that vary from run to run. (Was hoping it would help more, but I guess this is OK)

jreback · 2013-02-13T17:43:27Z

I don't think this needs the dtypes_bug branch
u can rebase off master?

it makes it easier for u I think

stephenwlin · 2013-02-13T17:45:47Z

yeah I rebased already I didn't realize I pulled that into this branch.

ghost · 2013-02-13T22:03:28Z

Could you take a look at:
numpy/numpy#324
#2490

and confirm that a similar issue isn't a problem here?

Thank you for all your recent work, great stuff.

stephenwlin · 2013-02-14T02:38:39Z

i'm pretty sure the functions already have undefined behavior with or without the memcpy if the buffers overlap, because one is being written while the other is being read and the relative order depends on the indexer passed in. but I guess it's probably not good to pass memcpy overlapping buffers just in case, since it's undefined behavior and technically could mean anything could happen, so I'll add a check.

stephenwlin · 2013-02-14T02:58:02Z

actually decided just to change memcpy to memmove, which should be fine: i'm sure the latter is doing a check internally that is as fast or faster than it could be written in C.

it will still be undefined behavior if the two buffers overlap, which is the case for all the Cython algos as far as I can tell, but at least it won't be calling a C function with invalid inputs.

ghost · 2013-02-14T03:07:08Z

IIRC there's a performance panelty associated with memmove vs memcpy,
better make sure being cautious didn't nullify the perf gain.

stephenwlin · 2013-02-14T03:13:05Z

sure, i'll check, but i can't imagine it would make a noticeable difference with a sane implementation. all it has to do some arithmetic on the pointers and the length first before deciding what to do: if there's no overlap it can do whatever memcpy does.

ghost · 2013-02-14T03:17:34Z

I would think so too, but if that's all there was to it I don't see why such little overhead
would be left outside memcpy in the first place.
But I really don't know what [X]-libc does in practice.

ghost · 2013-02-14T03:18:29Z

I do rememver toliphant of numpy fame being concerned with the associated
perf hit in related discussion.

stephenwlin · 2013-02-14T03:21:15Z

well, that's surprising re: toliphant...I guess I can revert to memcpy if it makes any difference: it's not like any of the Cython code has well defined behavior on overlapping buffers anyway. If we really wanted to be careful we could check to see if ((out if out.base is None else out.base) is (values if values.base is None else values.base)) on in every single algo first but it seems like overkill. (Btw it's kind of odd that arr.base is None rather that arr when an arr is not a view...really complicates things for no reason.)

stephenwlin · 2013-02-14T03:31:51Z

Actually apparently that doesn't even work, arr.base isn't guaranteed to be the ultimate owner of the buffer (http://projects.scipy.org/numpy/ticket/1232), so you'd have to walk through until you find something with base.flags.owndata. That or pointer arithmetic on the strides, etc (and the views may not be contiguous, so it wouldn't be easy). Yuck either way.

ghost · 2013-02-14T03:32:46Z

I think so too. wes has the last word though.

Not sure about the cython code, I suppose in some places it delegates to numpy
for example, which does handle this, at least in the new release.

stephenwlin · 2013-02-14T04:00:43Z

There's numpy.may_share_memory apparently.

stephenwlin · 2013-02-14T07:47:07Z

Pretty much the same results...

Results:
                                            t_head t_baseline      ratio
name
frame_reindex_axis0                         0.6879     3.2483     0.2118
frame_reindex_axis1                         4.0164     5.3709     0.7478
frame_boolean_row_select                    0.2907     0.3873     0.7505

jreback · 2013-02-14T11:52:50Z

how does frame_reindex_cast do? I think parts if that DO NOT hit this optimization, because int64 taking int 16,32,64

stephenwlin · 2013-02-14T16:00:13Z

"frame_reindex_upcast" you mean? it seems to improve but the amount varies run to run so I don't know how much of it is noise:

frame_reindex_upcast                       21.4190    24.5636     0.8720

frame_reindex_upcast                       23.4064    24.2167     0.9665

jreback · 2013-02-14T16:32:14Z

ok
that looks fine then

jreback · 2013-02-14T22:35:50Z

can you put a mention in RELEASE.rst (and for your converts_branch as well)...thxs

ENH: Consolidation and further optimization of take functions in common thanks!

stephenwlin added 4 commits February 14, 2013 18:43

CLN: add fill_value return value to common._maybe_promote

f2cd3ba

ENH: Consolidate and further improve performance of take functions

759b0db

ENH: optimize Cython take functions with memmove where possible

10f65e0

RLS: add release notes for 'opt-take-2' and 'fix-maybe-convert-objects'

f9dfa81

jreback added a commit that referenced this pull request Feb 14, 2013

Merge pull request #2867 from stephenwlin/opt-take-2

5b5e532

ENH: Consolidation and further optimization of take functions in common thanks!

jreback merged commit 5b5e532 into pandas-dev:master Feb 14, 2013

stephenwlin mentioned this pull request Mar 19, 2013

PERF: regression from 0.10.1 #3089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Consolidation and further optimization of take functions in common #2867

ENH: Consolidation and further optimization of take functions in common #2867

stephenwlin commented Feb 13, 2013

jreback commented Feb 13, 2013

stephenwlin commented Feb 13, 2013

ghost commented Feb 13, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

jreback commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

jreback commented Feb 14, 2013

jreback commented Feb 14, 2013

ENH: Consolidation and further optimization of take functions in common #2867

ENH: Consolidation and further optimization of take functions in common #2867

Conversation

stephenwlin commented Feb 13, 2013

jreback commented Feb 13, 2013

stephenwlin commented Feb 13, 2013

ghost commented Feb 13, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

ghost commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

jreback commented Feb 14, 2013

stephenwlin commented Feb 14, 2013

jreback commented Feb 14, 2013

jreback commented Feb 14, 2013