Abort on stack overflow instead of re-raising SIGSEGV #31333

lambda · 2016-01-31T23:56:46Z

Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow. We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code. To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status. Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all. Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273

rust-highfive · 2016-01-31T23:57:00Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

lambda · 2016-02-01T00:09:30Z

I've only tested this on Mac OS X, but I'm pretty sure that everything it does is kosher according to POSIX, and everything that's Unix-specific has been protected with #[cfg(unix)]), so I don't think this should break any platforms. The one thing that isn't in POSIX is MAP_ANON in the mmap that's used to map an unmap a page in the tests to test out that our processes are killed with the appropriate signals; however, every platform I checked in a quick search (Linux, FreeBSD, NetBSD, OpenBSD, Solaris, Mac OS X) supports that.

nagisa · 2016-02-01T00:46:29Z

src/test/run-pass/segfault-no-out-of-stack.rs

+
+    let mut buf = libc::mmap(0 as *mut libc::c_void,
+                             pagesize as libc::size_t,
+                             libc::PROT_WRITE | libc::PROT_READ,


Perhaps just do not pass in PROT_WRITE here to avoid the potent race condition where pointer could become valid between the unmap and dereference?

Not that it matters much.

Yeah, I tried that, but then I was getting SIGBUS instead of SIGSEGV like with mprotect. I figured that just unmapping would be a better way to get a guaranteed SIGSEGV. I don't know of any circumstances in which the kernel would re-map that memory for us without anything going on on our process to cause it to do so, so I don't think we need to worry about such a race here; also, POSIX requires that after calling munmap, further references to those pages will produce SIGSEGV.

lambda · 2016-02-01T03:19:45Z

cc @geofft @Zoxc @brson

alexcrichton · 2016-02-01T04:32:20Z

src/libstd/sys/unix/stack_overflow.rs


-        // See comment above for why this function returns.
+            libc::raise(libc::SIGABRT);


To behave more like rtabort! this may wish to use intrinsics::abort() as it mimics other runtime-abort behavior and means we don't need to mess around with signals or anything.

Yeah, but intrinsics::abort() does so via an illegal instruction, giving us SIGILL, which doesn't seem ideal if we're trying to clear up confusion caused by the particular signal received.

Ah yeah I was just going off @brson's desire that stack overflow "should abort like any other fatal error". Our other fatal errors today use rtabort! which ends up translating to intrinsics::abort.

It would indeed do so via an illegal instruction, resulting in SIGILL, and likely resulting in a core dump as well.

Would it make more sense to change rtabort! to call libc::abort instead of intrinsics::abort, now that we have libc::abort? libc::abort is supposed to handle all of the edge cases like this, either unregistering the signal handler or killing itself another way if the signal handler returns or just infinitely looping if all else fails.

I'd be down with that

Er, to be clear, on Unix that seems fine but on Windows we're trying to avoid the CRT wherever possible, so in that sense it may be a platform-specific abort process. I guess it makes less sense in that case :(

alexcrichton · 2016-02-01T04:34:06Z

r? @brson

You seem more opinionated than I, but both the current strategy an the proposed strategy are fine by me!

brson · 2016-02-02T01:33:12Z

Yeah, but intrinsics::abort() does so via an illegal instruction, giving us SIGILL, which doesn't seem ideal if we're trying to clear up confusion caused by the particular signal received.

My preference is for all aborts to be the same, and would prefer this to be an rtabort!. If it would be more proper to terminate with SIGABORT than SIGILL let's do that for all aborts, not just this one (but in a separate PR). Please do turn this into an rtabort.

brson · 2016-02-02T01:33:49Z

This is a breaking change, but probably one nobody is relying on.

lambda · 2016-02-02T05:22:14Z

Switched to rtabort! instead of manually raising SIGABRT, updated the commit message to indicate possible breakage.

brson · 2016-02-05T22:21:44Z

src/test/run-pass/out-of-stack.rs

+    use std::os::unix::process::ExitStatusExt;
+
+    assert!(!status.success());
+    assert!(status.signal() != Some(libc::SIGSEGV));


Presumably this is also true on Windows?

brson · 2016-02-05T22:22:03Z

@bors r+

bors · 2016-02-05T22:22:04Z

📌 Commit f608abb has been approved by brson

bors · 2016-02-06T00:27:14Z

⌛ Testing commit f608abb with merge 8a0874b...

bors · 2016-02-06T01:24:21Z

💔 Test failed - auto-mac-32-opt

We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising the SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes rust-lang#31273

lambda · 2016-02-06T01:43:17Z

Whoops, some architectures throw SIGBUS on a null pointer dereference rather than SIGSEGV. Pushed a fix that will check for either of those in the tests.

alexcrichton · 2016-02-06T02:16:03Z

@bors: r=brson ee79bfa

bors · 2016-02-06T02:59:43Z

⌛ Testing commit ee79bfa with merge 3be9ca1...

Abort on stack overflow instead of re-raising SIGSEGV We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes #31273

bors · 2016-02-06T05:04:47Z

💔 Test failed - auto-win-msvc-32-opt

lambda · 2016-02-06T05:23:13Z

Failure looks spurious.

brson · 2016-02-06T06:19:30Z

@bors retry

brson · 2016-02-06T06:33:13Z

cc @rust-lang/lang In some ways this is a pretty significant change to how processes terminate on stack overflow. It makes stack overflow terminate the process with rtabort! (which produces SIGILL), instead of the SIGSEGV / SEGBUS signal delivered by the OS. The rationale is that the segfault is an implementation detail of how std catches the stack overflow, we may want to change stack over behavior, etc.

bors · 2016-02-06T09:24:05Z

⌛ Testing commit ee79bfa with merge 35635ae...

Abort on stack overflow instead of re-raising SIGSEGV We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes #31273

bors · 2016-02-06T12:13:49Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-debug-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-linux-cross-opt, auto-linux-musl-64-opt, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-mac-ios-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc rust-lang#31273 cc rust-lang#31333

Use libc::abort, not intrinsics::abort, in rtabort! intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc #31273 cc #31333

rust-highfive assigned alexcrichton Jan 31, 2016

nagisa reviewed Feb 1, 2016
View reviewed changes

lambda force-pushed the 31273-abort-on-stack-overflow branch from 49cb4e1 to bba4272 Compare February 1, 2016 02:43

alexcrichton reviewed Feb 1, 2016
View reviewed changes

rust-highfive assigned brson and unassigned alexcrichton Feb 1, 2016

lambda force-pushed the 31273-abort-on-stack-overflow branch from bba4272 to 563904e Compare February 1, 2016 05:38

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Feb 2, 2016

lambda force-pushed the 31273-abort-on-stack-overflow branch from 563904e to 557313b Compare February 2, 2016 05:18

lambda force-pushed the 31273-abort-on-stack-overflow branch from 557313b to f608abb Compare February 2, 2016 05:28

brson reviewed Feb 5, 2016
View reviewed changes

lambda force-pushed the 31273-abort-on-stack-overflow branch from f608abb to ee79bfa Compare February 6, 2016 01:42

bors merged commit ee79bfa into rust-lang:master Feb 6, 2016

This was referenced Feb 6, 2016

Use libc::abort, not intrinsics::abort, in rtabort! #31457

Merged

Better way of discussing memory safety than "segfaults" #30963

Closed

mitaa mentioned this pull request Feb 18, 2016

Process exited with signal 11 when mutable array of usize created #31748

Closed

steveklabnik mentioned this pull request Mar 4, 2016

Stack overflow should not abort, it should fail!() #11011

Closed

lionel- mentioned this pull request Jun 8, 2023

Log backtrace on sigsegv and sigbus signals posit-dev/ark#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort on stack overflow instead of re-raising SIGSEGV #31333

Abort on stack overflow instead of re-raising SIGSEGV #31333

lambda commented Jan 31, 2016

rust-highfive commented Jan 31, 2016

lambda commented Feb 1, 2016

nagisa Feb 1, 2016

lambda Feb 1, 2016

lambda commented Feb 1, 2016

alexcrichton Feb 1, 2016

lambda Feb 1, 2016

alexcrichton Feb 1, 2016

lambda Feb 1, 2016

alexcrichton Feb 1, 2016

alexcrichton Feb 1, 2016

alexcrichton commented Feb 1, 2016

brson commented Feb 2, 2016

brson commented Feb 2, 2016

lambda commented Feb 2, 2016

brson Feb 5, 2016

brson commented Feb 5, 2016

bors commented Feb 5, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016

lambda commented Feb 6, 2016

alexcrichton commented Feb 6, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016

lambda commented Feb 6, 2016

brson commented Feb 6, 2016

brson commented Feb 6, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016


		// See comment above for why this function returns.
		libc::raise(libc::SIGABRT);

Abort on stack overflow instead of re-raising SIGSEGV #31333

Abort on stack overflow instead of re-raising SIGSEGV #31333

Conversation

lambda commented Jan 31, 2016

rust-highfive commented Jan 31, 2016

lambda commented Feb 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lambda commented Feb 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Feb 1, 2016

brson commented Feb 2, 2016

brson commented Feb 2, 2016

lambda commented Feb 2, 2016

Choose a reason for hiding this comment

brson commented Feb 5, 2016

bors commented Feb 5, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016

lambda commented Feb 6, 2016

alexcrichton commented Feb 6, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016

lambda commented Feb 6, 2016

brson commented Feb 6, 2016

brson commented Feb 6, 2016

bors commented Feb 6, 2016

bors commented Feb 6, 2016