-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix paltest_pal_sxs_test1 on illumos #105207
Conversation
I am not sure what is the best way to disable alternate stacks for signal handlers. Possibilities include:
The last two might be handy for cases where other platforms or emulators have the same problem with alternate stacks as illumos. The specific case I'm thinking of is the Linux emulation support included in some distributions of illumos. It has the same problem last I checked. |
Could you please open a bug at illumos.org/issues for the sigaltstack problem? |
10d7551
to
e7a547a
Compare
I updated how RPATH is set to be unconditionally turned on. This is consistent with how RPATH is set in a couple other places in this repo: https://github.com/dotnet/runtime/blob/main/src/coreclr/dlls/mscordbi/CMakeLists.txt#L4 https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/createdump/CMakeLists.txt#L52 This fixes this test for FreeBSD as well. |
Awesome! @Thefrank, another happy news for you. ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Confirming this resolves the test failure on FreeBSD as well! Thank you!
|
I filed a bug asking for advice: https://www.illumos.org/issues/16682 I took the relevant pieces of the PAL and made a C program that illustrates the difference in behaviors on different systems. Linux and FreeBSD always use the alternate stack, illumos and macOS won't use the alternate stack again if the handler does not return. https://github.com/AustinWise/CrashRepro/tree/master/c I also an open bug for LX zones on SmartOS, where they are not emulating the Linux behavior correctly: |
Got a helpful response on the illumos issue. If I understand correctly, what dotnet is doing today with contexts and sigaltstack is highly system-dependent (non-portable). I wonder if it might be possible to let dotnet have what it needs using only getcontext/setcontext etc? (basically adhere to the interface descriptions pointed to in the comments on that illumos bug.). I assume that would be a fair bit of work. |
One of the options Joshua mentioned:
As an existence proof that this is possible for .NET, NativeAOT takes this approach: CoreCLR currently is in a transition phase where it supports its original exception dispatch code and a new model based on NativeAOT's system (see references to |
I hacked together a proof of concept to show that it probably possible to change the CoreCLR hardware exception handling to return from the signal handler before dispatching the exception: There is probably a lot more work to make this production quality. For now I'd prefer to go with the approach in this PR (disable alternate stacks) and revisiting changing how signal handlers work after the old exception handling mode is removed. |
f692901
to
2622c0b
Compare
Now that .NET 9 has branched, I've rebased this PR and resolved the merge conflicts. |
@AustinWise thank you for looking into the alternate stack vs return from signal handler issue. I've hit the same thing in the past when I was considering moving macOS to handle hardware exceptions via signals to unify it with other Unix platforms. macOS also keeps track of the alt stack and when we don't return from the handler and redirect execution instead, next time we hit a hardware exception, it calls the handler on the original stack. |
Just after posting the message, I've remembered what the problem I've hit was. The issue was that in the stack overflow case (which is the actual reason why we use the alternate stack at all), the stack unwinding was broken. We use yet another stack for stack overflow handling so that we can use as small alternate stack as possible. For the function we would return to from the signal handler, I was unable to come up with a CFI annotation that would allow the unwinder to properly unwind to the original failure point location (due to no fixed relation between the original SP and the SP in the helper stack). It is not a problem for .NET stack walking, as we already have a mechanism to unwind from the EH code to the actual hardware exception location, a debugger and other similar tools would not be able to walk the stack past the EH code in the stack overflow case. |
When .NET translates SIGSEV to NullReferenceException, it does not return from the signal handler. Instead it resumes execution at the catch handler for the exception. This is not recommend by the manpage for sigaction(2): > It is not recommended that [the ucontext] arg be used by the handler to > restore the context from before the signal delivery. The practical effect of resuming execution without returning from a handler is that the alternate stack will not be used for subsequent signal delivery. This is in contrast to the behavior on linux, which will always use the alternate stack if the stack pointer at the time of fault does not fall on the alternate stack. Since the alternate stack is only usable for a single exception, don't bother using it for any exceptions.
2622c0b
to
85b5c51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you!
If the runtime is "just trying to unwind" to some deeper frame on the stack, then it should be possible to use getcontext (at the deeper frame to which we might unwind) and then in the exception handler, call setcontext to restore that context. That too does the OS-specific stuff about altstacks that returning from a signal handler does. Is "just trying to unwind" an accurate summary of what the runtime is doing? That was not clear to me after a quick look at the code. Should we have a separate issue for possibly improving the runtime exception handling? |
Yes, we should. I've created it here: #106645
The SIGSEGV handler allows .NET runtime to handle NullReferenceException that can occur at any arbitrary location in the code that our JIT produces (and also in a few handwritten assembler helpers). The exception handling code walks the stack until it finds a frame that handles the exception (it has a catch handler for it). Then it calls that catch handler and after it returns, it restores context to the code location after the catch handler to continue normal execution. It is also possible that the catch handler never returns, because it ends up throwing some exception again (could be any software / hardware exception). So we cannot reasonably use the getcontext as we would essentially have to call it before every call (as we don't know which method can end up throwing an exception). And even if we could, it would add unnecessary and non-trivial overhead to the non-exceptional code execution. I believe that returning from the signal handler with instruction pointer in the context modified to point to a helper function that then invokes the exception handling code is the right thing to do here. It will actually be a bit more involved so that the failing frame is still visible on the stack when running the exception handling code. |
Done
I bet |
These changes taken together make
paltest_pal_sxs_test1
pass. Along the way it also fixes building the rootfs for illumos. Each commit description contains details about the changes; here is a summary:paltest_pal_sxs_test1
executable so that it can find the dynamic libraries used in the test.Tested on:
I did the testing using the updated cross rootfs from dotnet/arcade#14978.
Fixes #35362