-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Consolidation and further optimization of take functions in common #2867
Conversation
I don't think this needs the dtypes_bug branch it makes it easier for u I think |
yeah I rebased already I didn't realize I pulled that into this branch. |
Could you take a look at: and confirm that a similar issue isn't a problem here? Thank you for all your recent work, great stuff. |
i'm pretty sure the functions already have undefined behavior with or without the memcpy if the buffers overlap, because one is being written while the other is being read and the relative order depends on the indexer passed in. but I guess it's probably not good to pass memcpy overlapping buffers just in case, since it's undefined behavior and technically could mean anything could happen, so I'll add a check. |
actually decided just to change it will still be undefined behavior if the two buffers overlap, which is the case for all the Cython algos as far as I can tell, but at least it won't be calling a C function with invalid inputs. |
IIRC there's a performance panelty associated with memmove vs memcpy, |
sure, i'll check, but i can't imagine it would make a noticeable difference with a sane implementation. all it has to do some arithmetic on the pointers and the length first before deciding what to do: if there's no overlap it can do whatever |
I would think so too, but if that's all there was to it I don't see why such little overhead |
I do rememver toliphant of numpy fame being concerned with the associated |
well, that's surprising re: toliphant...I guess I can revert to |
Actually apparently that doesn't even work, |
I think so too. wes has the last word though. Not sure about the cython code, I suppose in some places it delegates to numpy |
There's |
Pretty much the same results...
|
how does frame_reindex_cast do? I think parts if that DO NOT hit this optimization, because int64 taking int 16,32,64 |
"frame_reindex_upcast" you mean? it seems to improve but the amount varies run to run so I don't know how much of it is noise:
|
ok |
can you put a mention in RELEASE.rst (and for your converts_branch as well)...thxs |
ENH: Consolidation and further optimization of take functions in common thanks!
I've consolidated
take_nd
,ndtake
, andtake_fast
incommon
into a single signaturetake_nd
which has cleaner semantics (which is documented in a docstring), at least preserves the existing performance properties in all cases, and improves it (sometimes significantly) in some. (In particular, computation of intermediate arrays like boolean masks are still being short-circuited in the same way they were before, in the appropriate situations.) The operation that used to betake_fast
was also broken for non-NAfill_value
: this is fixed too.In addition, I've optimized the Cython implementations of 2-D takes to use row-wise or column-wise
memcpysmemmove
s automatically in places where it is appropriate (same type input and output, non-object type, both c-contiguous if slice axis is 0, both f-contiguous if slice axis is 1...). In theory Cython and/or the C compiler could do this automatically, but apparently it doesn't because the performance does improve when this is triggered, at least with gcc on 32-bit linux.Tests with measurably improved performance (this is on top of the performance improvements in #2819)
All other ratios are apparently just statistical noise within the +- 10% range that vary from run to run. (Was hoping it would help more, but I guess this is OK)