Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: safety bounds checks for take funcs #3029

Closed
wants to merge 1 commit into from

Conversation

stephenwlin
Copy link
Contributor

as per #3028; haven't done vb_suite yet but presumably we're ok paying the price for safety; will run to see what the hit is though

@wesm
Copy link
Member

wesm commented Mar 12, 2013

Hmm, I'm not sure this is going to pass muster with the vbench suite. That's a lot of extra comparisons. Those take functions were never intended to be exposed to arbitrary user input

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

@wesm are you aware of user facing possibily for neg indicies anywhere except Dataframe.take, generic.take and _ixs (old icol) (which .ix and friends calls)?

@wesm
Copy link
Member

wesm commented Mar 12, 2013

I don't think so. Easiest would be sanitizing take input to Series/DataFrame.take etc.

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

done #3027

@stephenwlin
Copy link
Contributor Author

i'll test it and see...right now it looks like i have a non-optimal bounds checking in the 2d_multi case so I'm fixing that, but in the 2d_axis0 and 2d_axis1 cases the overhead doesn't seem very much; if we decide to skip merging that's ok (i had this lying around for awhile anyway)

@stephenwlin
Copy link
Contributor Author

@jreback it seems like your fix will solve the handling of properly in-range negative indices, but (unless you add more checks) it won't help if someone provides a wildly out of bounds index? anyway, if you add those checks and we can guarantee that's the only user-facing code that allows indexes to pass into the cythonic takes, then i'm fine not merging this, but I'm really just not sure how sure we are about that

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

@stephenwlin hmm...that's easy to add, just raise an IndexError.....since user-facing code goes thru here then (unless there are other cases), should be good.....let me revise

@stephenwlin
Copy link
Contributor Author

ok, well, I'll leave this PR here just in case, but if someone goes through the effort of verifying that we're ok without this, i'm fine with that too

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

ok.. #3027 is updated to handle neg and out-of-bounds for take, just waiting for travis to build...(so wait a while!)

@stephenwlin
Copy link
Contributor Author

here's the vb_suite impact (+- 10% removed)

timeseries_asof_single                      0.0566     0.0513     1.1029
groupby_multi_different_numpy_functions    24.9161    22.1570     1.1245
groupby_multi_different_functions          24.9543    22.0737     1.1305
unstack_sparse_keyspace                     2.1081     1.8639     1.1310
groupby_multi_python                       78.9485    69.8002     1.1311
groupby_multi_series_op                    28.0711    24.8115     1.1314
series_align_int64_index                  211.3378   182.9379     1.1552
stat_ops_level_frame_sum_multiple          19.1902    16.4143     1.1691
stat_ops_level_series_sum_multiple         17.5334    14.8450     1.1811
concat_series_axis1                        90.0609    75.5053     1.1928
series_align_left_monotonic                39.2561    32.7489     1.1987
timeseries_asof_nan                        17.9400    14.8801     1.2056
timeseries_asof                            18.9608    15.6736     1.2097
reshape_pivot_time_series                 357.1150   292.3658     1.2215
reindex_multiindex                          1.9434     1.5629     1.2435
reshape_unstack_simple                      6.0946     4.8971     1.2445
sparse_series_to_frame                    344.4140   273.0908     1.2612
reindex_daterange_pad                       0.5872     0.3029     1.9386
reindex_daterange_backfill                  0.5915     0.3024     1.9558

not quite sure yet why reindex_daterange_pad and reindex_daterange_backfill are impacted so much

@wesm
Copy link
Member

wesm commented Mar 15, 2013

I think we should leave the bounds checks out-- probably just want to have one set of "fast but unsafe" take functions and "safe" take functions that can accept arbitrary user input (or something).

@stephenwlin
Copy link
Contributor Author

@wesm ok, I have no objection to that actually

@ghost
Copy link

ghost commented Jul 29, 2013

vetoed by wes.

@ghost ghost closed this Jul 29, 2013
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants