-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Optimize take_*; improve non-NA fill_value support #2819
Conversation
I updated dtypes a couple of times today good news is it looks like changes are for the most part independent so u can prob merge easily |
was after you updated - amazing we actually worked on a small overlap at the same time! |
@jreback, rebased and resolved conflicts EDIT: actually, need to add int8 and int16 support to |
I think that the new way that u did the take dictionaries in order to specify the cython functions - need to add for int8/int16 I think the merge took your versions (which is correct), but need to add in those other functions |
hah! again typed the same thing at the same time! |
? how are u running test_perf to get the vbench ratios - I can never get it to work? |
ok, merged with int8/int16 support... i just run ./test_perf.sh -b {BASE_SHA} ... not sure why it doesn't work for you? also, I installed vbench from https://github.com/pydata/vbench |
thanks I give it a try (I was putting too many options I think) |
FYI - not sure if u r setup for Travis - it picks up on a lot of dtype stuff and also shows how py3 react |
no, i'm not...how do i set that up? |
http://about.travis-ci.org/docs/user/getting-started |
thanks for the travis info, will take a look tomorrow (hopefully I didn't break too much in 3.3) |
has someone been playing with |
i squashed all the commits in #2708 and somehow it showed up as yours, not sure why. if I do it again i'll try to update it to jreback. (EDIT: ok, I went ahead and amended) |
I think you should squash my PR as well...ultimately yours will be merge on top of master (after mine) @y-p or is that right, since he is useing my PR as a starting point (instead of master), I assume that updates/sync are all ok, but is there a difference in how you push/present? |
unless i'm mistaken about how things work, i think it's easier to keep it separate for now in case you update again: what i do to resolve is to reset back to master, merge your dtypes branch, squash everything into one commit, then cherry-pick the last version of my commit on top of that--it makes it easier to tell where the conflicts are that way. after your PR is merged i will rebase and everything will be a single commit off the new master; should work fine, I think? |
yep, exactly right. stephen keeps rebasing his commit on top of a squashed version of your rolling PR, and |
ahh...makes sense..thanks! |
see you got travis running! great! |
do you think we should add take support for uints? (and prob pad/fill and cases where I add ints now)? |
your generic takes handle this of course....question is should we include the cython specializations? |
i don't have an opinion, myself--i don't know enough about use cases. i'm fine updating the branch with whatever anyone else decides makes sense, though. |
ok....minor push...removed support for float16...apparently cython really doesn't support these (and I was doing a conversion)...showed up in dff_int8 -> float16 (now goes to float32)... also...you might want to add a test for _maybe_promote...something like:
|
I rebased....hopefully didn't screw u up.....(and done unless doesn;'t build properly) |
|
yes, of course.....that looks great.....its tricky to see diff's between the commits because of the dtypes changes.....but found it! |
Ahh, turns out |
rebased off master after #2708 merged |
if %(raise_on_na)s and _checknan(fill_value): | ||
for i in range(n): | ||
for i from 0 <= i < n: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Cython optimizes range(n)
so this won't have any effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i know...but there was a mix of the two between the templates so I decided to make it consistent...
@stephenwlin if you have a chance....can you take a quick look at #3089 |
Optimizes/improves
common.take_1d
,common.take_2d
,common.take_2d_multi
, andcommon.take_fast
in the following ways:fill_value
parameters other than NaN: previously, all upcasting was done assuming NaN as thefill_value
: now, the correct upcast will be chosen based on the type of thefill_value
. (Ex. integers stay as integers with integerfill_value
if the value can fit in the existing size without overflow, integers and floats upcast to complex with complexfill_value
, all numerics upcast to objects with boolean or stringfill_value
, etc.)Added option offill_value
ofNone
, which means that the corresponding entries in the output are left unchanged (only really makes sense whenout
is also provided)The last option is used to optimizeinternals._merge_blocks
(i.e. the function internally called to do block consolidation): instead ofvstack
-ing the arrays together in an intermediate buffer and then rearranging into the output with a separate operation, the output buffer is allocated uninitialized and directly filled from the inputs using one call tocommon.take_fast
per input block, even when they are not contiguous. (However, the special case of contiguous and already properly ordered blocks is still handed withvstack
, since no other operation is required after that)EDIT: took out
fill_value
ofNone
option and revertedinternals._merge_blocks
: it wasn't actually helping. Upcasting within take functions is definitely faster though.The existing vb_suite tests were not really testing upcasting, so I added one for it. Here are the results vs. master:
and against jrebacks/dtypes: