-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close TLS-based soundness holes by running closure on a newly created thread. #3646
base: main
Are you sure you want to change the base?
Conversation
3b55a4b
to
8df3961
Compare
7a1c30f
to
924ab8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Overall, this looks like a promising approach to avoid the tls related issues and the auto trait.
The only downside I can see about this is that it brings a whole new set of thread locals into play, for the worker thread. I think there are interactions here which we'd need to at least document:
- Callers of
allow_threads
will have to treat allthread_local
values as in an undefined state at the beginning of the execution of the closure. I say "undefined" and not "uninitialized" because with the thread pool, reuse of the threads means that it's possible to observe effects from one closure in another. I wonder iftokio
has had prior discussion of this with theirspawn_blocking
API, which I think works similarly. - Inside the closure, if a user then called
Python::with_gil
, this will create a new Python thread state for the worker thread. Is this expected?- Do we make
with_gil
run back on the original thread which is blocked waiting for the result? (e.g. by using a similar mechanism toTask
in reverse?). - Or do we just document that this will happen, i.e. put it in the same bucket as all the rest of the "thread locals are undefined" statement of above?
- Do we make
I understand the desire to keep the safe allow_threads
name, but I still have a feeling that the semantic change here might mean that it's better to rename and break existing code.
Indeed. I think we should certainly document this and of course, what I would really want to have from the Rust standard library here would be a function like fn with_isolated_tls<F,T>(f: F) where F: Send + FnOnce() -> T, T: Send; which would just wipe the slate clean w.r.t. the system-level TLS implementation if that is actually possible? Do you have an educated guess on that @steffahn? It did not find any directly addressing this in the
I would prefer to not trying to call back and forth, mainly to keep
Still torn because I think we should keep the safe way the obvious and effortless way. Otherwise, people like me will start looking into the alternatives and see that the unsafe one is more efficient and wonder "Maybe that little bit of unsafe is worth avoid the HUGE COST of spawning a thread?!" and be tempted by the dark side of Rust. If the more efficient but unsafe way must be looked for and opted in explicit, this will more likely happen only if there is an actually diagnosed/measured performance issue. At least for scientific code for example, I find that unlikely because the closure in allow threads will usually start/call upon a whole bunch of threads doing something using e.g. Rayon. It is not something I would like to fight about, but I would be glad if we could get some more input from other maintainers and/or users. (And this stays a draft until we decided one way or the other. I am not trying to decided this having this PR which hijacks the name and is still missing the escape hatch version in any case.) |
(Will take care of the MSRV build separately... Uff.) |
Time to bump to 1.62? 😈 |
I take your point here, my concern can probably be mitigated by a clear entry in the migration guide. |
One option which might work is to move the Python thread state into the worker thread using some kind of |
Just noticed that this would only close the |
fea8df7
to
d419a21
Compare
d419a21
to
5728cf4
Compare
d5826b4
to
5eb7aa7
Compare
5eb7aa7
to
84a868a
Compare
Note that I think the main difference here (no chance for contention, better locality, but higher chance of cold starts and more threads overall) will not be reflected by the micro benchmark which measures only the uncontended and warmed-up case and is unable to quantify system-level effects like the cost of more threads overall. I did rerun the micro benchmark and the word-count benchmarks and in the first case, any changes were lost within the noise of switching to another thread (i.e. basically still 5µs remote versus 30ns local) and in the second case, the difference between local and remote was dwarfed by the cost of Python's So while I have little numbers to back this, I do think the improved locality and code simplicity of the TLS variant is the better choice. |
First of all I very much dislike having a custom threadpool in pyo3, let alone one written with unsafe. Personally I don't feel qualified to review it and convince myself there are no soundness holes or gotchas in it - let alone make any changes or adjustments to it myself.
I think we shouldn't do this, see #3640 (comment) for my reasoning. tl;dr: these issues are so niche, hard to accidentally encounter and unlikely to actually "work" that it's not worth the overhead and complexity of this solution. Also, what is more likely: someone hitting the current soundness issue, or an issue with that custom written threadpool?
|
Finally I do want to thank you for all the effort you and the reviewers have put into this. I really appreciate the commitment whether or not we end up merging this 👍 |
Note that we are not using a global thread pool any more but rather a sort of "companion thread", i.e. each user thread will have at most one worker thread associated and there is no cross communication between these pairs of user and worker thread which simplifies the concurrency structure significantly. The unsafety is also limited to implementing "scoped execution" for the closures, i.e. the actual communication with the worker thread is entirely safe including passing the closure and its result between user and worker thread. Hence, I would argue that this is actually somewhat simpler than say the GIL pool which we are currently maintaining. That said, ...
... I don't think that us making a mistake in the implementation of dispatching those closures is the main problem here (We will probably make mistakes as elsewhere but we can fix those centrally here.), but rather the change in observable behaviour, e.g. user code which actually relies on thread-local storage to do something useful. This why we need to have the three options on the builder for remote/safe or local/unsafe or even unconstrained/unsafe operation, so that code which does rely on TLS can continue to work (of course with the punted responsibility of only using the right kind of TLS). Furthermore, I'd say that the status quo involving the stable-this-nightly-that But in the end, the crux of the issue IMHO is
i.e. how much do we value soundness in the sense that safe API cannot be abused. Personally, I'd say the whole story of Rust is investing additional engineering effort to allow safe code to be written without having to worry what some other code in the code base or its dependency closure does. So from my perspective, us taking on additional complexity and maintenance effort is yet another step in the basic direction that Rust embodies in the programming language domain. |
As per #3640 (comment) I feel quite strongly that we cannot do nothing here, and in particular I don't think we can leave the existing
I agree with this statement, I think a key part of the point of our work here is to build a watertight solution for Rust/Python interop, so that downstream code can worry about their application choices without worrying about whether our implementation is sound.
While I agree with this too, I also think that the existing
I think this argument can be made about any @mejrs - I think we all hope to have maintainer consensus before moving forward, so it's doubly painful given that I feel doing nothing isn't an option. Is there a reduced form of this PR or an alternative proposal that you would find more palatable? |
So, following up on this a month later, my feeling is that we proceed to merge this for 0.21. It doesn't seem to me that we have any real option other than to try to make this API sound. While I don't love having three forms of what's more or less the same API this seems like the lesser of the evils we need to pick from. |
My preferred alternative proposal is the debug mode check that the gil is held. But, given that there are no real life cases of people running into these issues (as far as I am aware) I think just doing nothing is also acceptable. |
cc @alex; as you are a security expert, any opinion you have on what we should be trying to do here would be valuable. I definitely want to be cautious but if the consensus is a less drastic option is a better choice then I can be overridden. |
(Adding to my TODO list to review tonight) |
I don't think doing nothing is a good approach. We could decide to not lose the soundness loophole, but I think in that case we should remove I also think that we should involve the community pro-actively if we do not close it as it does run against the general consensus established for the Rust ecosystem. Personally, I would add that I think this should be done by those advocating for not closing it as I see it as part of the necessary work to implement that decision. |
On performance (and, honestly, aesthetic) grounds I hate having to run closures on a different thread. I think it takes what should be a simple operation and makes it very complex and gives unexpected performance implications. But soundness is kind of important, it's why we like Rust :-) So my big question is: Are there any alternative designs that might provide soundness without the performance cost? @mejrs mentioned a gil held check, but I don't think I saw the description of that. Assuming there's not a better solution to providing soundness (and I guess we're talking here because we don't have one), I think we should do this. Yes, it's ugly. But it provides safety. Note: I haven't reviewed the PR itself, I'm not vouching for the implementation, just the concept :-) A major motivator in my thinking is that once there's a nogil python, it obviates this question entirely: if there's no GIL, there's no need to call |
This would have to happen on call into the interpreter and hence performance would suffer everywhere which is not palatable. The proposal was to use a
If we had the option to control thread identity, i.e. just stash thread-local storage instead of actually switching threads, this could be faster. But this is purely hypothetical as this would need to be implemented in the compiler and standard library. |
It's worth noting here that on the surface this seemed to me a little bit like how integer overflow panics only in dev builds, but rather than being UB in release integer overflow is well-defined as two's complement. So I think the choice to have soundness only in dev would be somewhat more extreme. (We also haven't measured how much this performance overhead would impact dev builds, nor how many places in the code we'd have to introduce this check.) |
An unexpected but IMHO salient data point on how theoretical this problem is: #3788 (reply in thread) |
So, I think we're quickly approaching the 0.21 release, and we have yet to make a decision here. I still strongly believe that despite no option being appealing, moving forward with this is the most practical way to close the soundness hole and that we should do this. I am tempted to just rebase this and see it over the line, but I'm also mindful about the amount of churn already going into 0.21 with the Bound API. So, how about as a compromise, we let this slip one release until 0.22? That gives folks with other opinions time to gather support, and if there hasn't been a compelling alternative presented by the time we're ready to push 0.22 then we default to this option. |
I'm personally ok with that. |
@adamreichold are you ok if I take this PR over and push the finishing work tomorrow while I'm at the PyCon sprints? |
I would be rather grateful if find the time to do this! I have been dreading rebasing this for some time now... |
Well in the end time shot past me but I'll aim to rebase this in the very near future so we can ship 0.22 |
Ok, first conflict resolution done, I think there will be some finishing work which I'll try later. |
7b427ec
to
92efc97
Compare
92efc97
to
64a9e65
Compare
Ok, this now has all conflicts / tests fixed. Just needs some docs, which I'll aim to do in the very near future. |
To be done before this is a candidate for merging:
unsafe_allow_threads
to unsafe token approachallow_threads
nameCloses #2141
Closes #3640
Closes #3658