-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow not caught in Drop for TLS data #111272
Comments
cc @m-ou-se may be of interest to you as you clean up the thread local impl. |
@ridiculousfish Did you believe that there is potential for UB from safe code here? If you can explain the circumstances where UB can arise here, I'll tag this with I-unsound (which is the tag used for the ability to invoke UB from safe code). |
@bstrie There is no unsafe code in the repro case. I don't know if a stack overflow which is not caught by Rust's handler is considered UB, but imagine it should be, else there would be no reason for the handler. So (conservatively) yes. |
I don't think it's ever actually been definitively answered as to whether the stack protector is considered a soundness constraint or a quality of implementation detail. |
In general it's target-specific if we implement it (and even depends if Rust is in control of main) so I think it's considered quality of implementation (if for no other reason than pragmatism). That said, I think this is worth fixing. On most targets (I don't think all, but it's been a while) we have enough control over the order dtors are run in already (since we run them manually most of the time) that we should be able to ensure the guard teardown happens last. |
I can look at this this weekend. |
Duplicate of #109785. In short, the problem is that
The first problem is quite easily handled by putting the guard page location in a second TLS variable, but the second problem is more complicated. I tried to resolve these by eagerly running the destructors in the thread itself (see #109858) but that appears to interact badly with the platform |
Despite being a duplicate, I am going to close the other issue instead, because this one has more useful data. |
It is not UB. The reason we have this handler at all AFAIK is to avoid people thinking they managed to violate memory safety even though a stack overflow is guaranteed to result in an abort through SIGSEGV on every platform where we or the OS sets up a guard page and stack probing is used. (currently only x86/x86_64, but ideally every arch) The guard page doesn't get removed when dropping TLS data. |
To clarify there are potentially two guard pages at play here:
Whether this second page is created is platform dependent. |
Didn't think about the unmapping of the rust created guard page. Looks like it is dropped right between returning from the thread main function and dropping TLS: rust/library/std/src/sys/unix/thread.rs Lines 104 to 109 in ad23942
|
AFAIK this is just a soundness bug for some targets and environments (like when
So that does sound like a soundness issue then, if the guard page no longer exists when TLS dtors run? |
It's not, because the guard page is provided by the system and available during TLS destruction. What's not available is our signal stack, which we unregister when the thread main function returns. Therefore, the signal handler will be run on the overflowing stack (more specifically: on the guard page), resulting in an immediate, system-caused second |
There seems to be contradicting information here as above @bjorn3 said that the guard page is unmapped. That was the Rust guard page, which I presume is not the same thing as the system guard page, but if there is guaranteed to be a system guard page then why do we even have a Rust guard page? |
There is no Rust guard page, except for some platforms where we protect the main thread (but that one we never reset). We do need a stack for the signal handler though, and that stack we free at thread exit, otherwise we'd leak quite some memory. |
…cupiver std: make `thread::current` available in all `thread_local!` destructors ... and thereby allow the panic runtime to always print the right thread name. This works by modifying the TLS destructor system to schedule a runtime cleanup function after all other TLS destructors registered by `std` have run. Unfortunately, this doesn't affect foreign TLS destructors, `thread::current` will still panic there. Additionally, the thread ID returned by `current_id` will now always be available, even inside the global allocator, and will not change during the lifetime of one thread (this was previously the case with key-based TLS). The mechanisms I added for this (`local_pointer` and `thread_cleanup`) will also allow finally fixing rust-lang#111272 by moving the signal stack to a similar runtime-cleanup TLS variable.
…cupiver std: make `thread::current` available in all `thread_local!` destructors ... and thereby allow the panic runtime to always print the right thread name. This works by modifying the TLS destructor system to schedule a runtime cleanup function after all other TLS destructors registered by `std` have run. Unfortunately, this doesn't affect foreign TLS destructors, `thread::current` will still panic there. Additionally, the thread ID returned by `current_id` will now always be available, even inside the global allocator, and will not change during the lifetime of one thread (this was previously the case with key-based TLS). The mechanisms I added for this (`local_pointer` and `thread_cleanup`) will also allow finally fixing rust-lang#111272 by moving the signal stack to a similar runtime-cleanup TLS variable.
…piver std: make `thread::current` available in all `thread_local!` destructors ... and thereby allow the panic runtime to always print the right thread name. This works by modifying the TLS destructor system to schedule a runtime cleanup function after all other TLS destructors registered by `std` have run. Unfortunately, this doesn't affect foreign TLS destructors, `thread::current` will still panic there. Additionally, the thread ID returned by `current_id` will now always be available, even inside the global allocator, and will not change during the lifetime of one thread (this was previously the case with key-based TLS). The mechanisms I added for this (`local_pointer` and `thread_cleanup`) will also allow finally fixing rust-lang#111272 by moving the signal stack to a similar runtime-cleanup TLS variable.
…viper std: make `thread::current` available in all `thread_local!` destructors ... and thereby allow the panic runtime to always print the right thread name. This works by modifying the TLS destructor system to schedule a runtime cleanup function after all other TLS destructors registered by `std` have run. Unfortunately, this doesn't affect foreign TLS destructors, `thread::current` will still panic there. Additionally, the thread ID returned by `current_id` will now always be available, even inside the global allocator, and will not change during the lifetime of one thread (this was previously the case with key-based TLS). The mechanisms I added for this (`local_pointer` and `thread_cleanup`) will also allow finally fixing rust-lang#111272 by moving the signal stack to a similar runtime-cleanup TLS variable.
Fixes rust-lang#111272. With rust-lang#127912 merged, we now have all the infrastructure in place to support stack overflow detection in TLS destructors. This was not possible before because the signal stack was freed in the thread main function, thus a SIGSEGV afterwards would immediately crash. And on platforms without native TLS, the guard page address was stored in an allocation freed in a TLS destructor, so would not be available. rust-lang#127912 introduced the `local_pointer` macro which allows storing a pointer-sized TLS variable without allocation and the `thread_cleanup` runtime function which is called after all other code managed by the Rust runtime. This PR simply moves the signal stack cleanup to the end of `thread_cleanup` and uses `local_pointer` to store every necessary variable. And so, everything run under the Rust runtime is now properly protected against stack overflows.
Rust attempts to catch stack overflow by installing a guard page at the end of the valid stack range. This guard page is stored in TLS and is torn down when a thread exits (including the main thread).
However other thread local data may run
drop
after this guard page is torn down. Stack overflows occurring in thesedrop
implementations are not detected by Rust. (It may be backstopped by the OS, but this is system dependent.)To reproduce:
This causes a
SIGILL
on macOS and aSIGSEGV
on Linux. In both cases I confirmed that Rust's stack overflow signal handler is not run.Reproduced on rust stable and master branch.
The text was updated successfully, but these errors were encountered: