Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortify cython functions against segfaults (without killing perf) #3028

Closed
ghost opened this issue Mar 12, 2013 · 15 comments
Closed

Fortify cython functions against segfaults (without killing perf) #3028

ghost opened this issue Mar 12, 2013 · 15 comments

Comments

@ghost
Copy link

ghost commented Mar 12, 2013

Following discussion in #2892, and segfaults in #3011, #2775, #2778 and others.

manually insert needed checks before inner loop, leaving @cython.boundscheck(False)
inplace.

@stephenwlin
Copy link
Contributor

i'll add the comment here that i added at #2892: this can probably be done within the outer loop but outside the inner most loops, instead of before the entire thing, so we avoid double-iteration

@ghost
Copy link
Author

ghost commented Mar 12, 2013

In at least some cases, a single len() assertion before any loop is all that's required.

Slightly punting by just opening an issue, but the next segfault will kick
me into action, no doubt.

@stephenwlin
Copy link
Contributor

oh actually, hah, i forgot that i actually already did this in a work-in-progress branch somewhere, i'll just isolate it, test it again, and submit a PR; also, I'm not sure how to avoid a loop in take_* functions, since the indices are provided as a list and any one of them could be bad?

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

AFAICT take_nd was only getting possibily getting negative inputs from frame/take.py....if that is indeed the case (and I can't prove a negative), then I would think that you could take bounds checking off ENTIRELY for the takes? (of course at the risk of a seg fault)......

@stephenwlin
Copy link
Contributor

@jreback can't be sure, but i'm going to guess there's more cases than that, actually; anyway, the performance hit should not be that bad, if the checks are written to avoid redundancy

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

@stephenwlin maybe put in a debug print (temporarily) to SEE where any of the test cases hit it with neg indicies....

@ghost
Copy link
Author

ghost commented Mar 12, 2013

I had a specific case in mind which is the groupby_x functions in generated.pyx,
in those cases len(index) == len(labels) should be checked prior to the loop.
That's what #3011 was about, and I'll open a PR for those.

Other cases are probably not that simple, and up to your proven good judgment.

in any case - measure first, then decide.

@stephenwlin
Copy link
Contributor

@jreback negative indices are somewhat of a different issue though; the bounds checking is necessary unless we're sure we're not passing user-supplied indices to take_*, which we'll never know with 100% certainty with our test coverage...anyway, I'll vb_suite the branch I just PRed to see what the hit is (just wanted to put it up in case anyone wanted to review)

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

yes....suppose if it doesn't hurt perf then should check (rather than seg fault)

@ghost
Copy link
Author

ghost commented Mar 17, 2013

#3031 closes all the gaps I've seen in groupby_x and agg functions, and wes
was -1 on the perf hit of touching take_X. Closing until other segfaults are reported...

@ghost
Copy link
Author

ghost commented Mar 26, 2013

Hi again. #3168

@ghost ghost reopened this Mar 26, 2013
@cpcloud
Copy link
Member

cpcloud commented Apr 26, 2013

I can't repro this...is it still an issue?

@ghost
Copy link
Author

ghost commented Apr 27, 2013

Not sure what exactly you can't repro. A number of segfaults have been fixed, I repopened
because from time to time it becomes clear that that are still segfaults hiding in the cython
code. I suggested making the cython less reckless, but perf reasons won out repeatedly.

I guess this won't happen after all.

@ghost ghost closed this as completed Apr 27, 2013
@cpcloud
Copy link
Member

cpcloud commented Apr 27, 2013

Tried to repro #3168. Raised an indexerror which seems reasonable.

@ghost
Copy link
Author

ghost commented Apr 27, 2013

yeah, the issue page cites #3191

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants