-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: safety bounds checks for take funcs #3029
Conversation
Hmm, I'm not sure this is going to pass muster with the vbench suite. That's a lot of extra comparisons. Those take functions were never intended to be exposed to arbitrary user input |
@wesm are you aware of user facing possibily for neg indicies anywhere except |
I don't think so. Easiest would be sanitizing take input to Series/DataFrame.take etc. |
done #3027 |
i'll test it and see...right now it looks like i have a non-optimal bounds checking in the 2d_multi case so I'm fixing that, but in the 2d_axis0 and 2d_axis1 cases the overhead doesn't seem very much; if we decide to skip merging that's ok (i had this lying around for awhile anyway) |
@jreback it seems like your fix will solve the handling of properly in-range negative indices, but (unless you add more checks) it won't help if someone provides a wildly out of bounds index? anyway, if you add those checks and we can guarantee that's the only user-facing code that allows indexes to pass into the cythonic takes, then i'm fine not merging this, but I'm really just not sure how sure we are about that |
@stephenwlin hmm...that's easy to add, just raise an IndexError.....since user-facing code goes thru here then (unless there are other cases), should be good.....let me revise |
ok, well, I'll leave this PR here just in case, but if someone goes through the effort of verifying that we're ok without this, i'm fine with that too |
ok.. #3027 is updated to handle neg and out-of-bounds for |
here's the vb_suite impact (+- 10% removed)
not quite sure yet why reindex_daterange_pad and reindex_daterange_backfill are impacted so much |
I think we should leave the bounds checks out-- probably just want to have one set of "fast but unsafe" take functions and "safe" take functions that can accept arbitrary user input (or something). |
@wesm ok, I have no objection to that actually |
vetoed by wes. |
as per #3028; haven't done vb_suite yet but presumably we're ok paying the price for safety; will run to see what the hit is though