-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal handler for SIGBUS, SIGSEGV #1070
Conversation
Could we get away with just printing a hex dump of the siginfo struct ? Something like this maybe. |
Hmmm actually, if we write the handler in C and just call it from Rust shouldn't it work on different versions of linux, provided that the names of the members of the struct remain the same. Because the definition of the struct will be imported directly from |
vmm/src/signal.rs
Outdated
@@ -0,0 +1,230 @@ | |||
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really keen on having linux signals related code split around different crates like so. For example there is also sys_util/src/signal.rs
.
The logic looks solid 👍
Just dropping by with my 2 cents (I spent a fair amount of time messing with @serban300 If the user built the binaries themselves, yes. But distributed versions would need to have runtime kernel version detection to pick the correct structure layout to use, or be built to target specific kernel versions. @aghecenco Your |
@nbsdx Thanks for that, good call! I followed your issue - I first stumbled across it when I added the signal handler for seccomp, and again now for the more general purpose one. I got a bit lost in the details and would need to backtrack and fix the alignment issues but first I need to know if there's value for Firecracker in doing this in Rust. The problem is with kernel compatibility, as Firecracker isn't strongly tied to a specific kernel version, and I find it would be overkill to add kernel-specific functionalities to work around known bugs in older kernels that we support (plus runtime version checks). I don't understand why the solution @serban300 proposes wouldn't work. The C API also exposes 4.14: #define si_pkey _sifields._sigfault._pkey 4.16: #define si_pkey _sifields._sigfault._addr_pkey._pkey Using the getters instead of accessing the struct fields themselves should protect against compatibility issues. That said, I'm wary about adding raw C in Firecracker. @firecracker-microvm/compute-capsule thoughts? |
Sorry, I wasn't particularly clear on that. The issue would be with binary releases. User-built binaries would generally be fine as they would be built against the correct headers. But if the firecracker build server builds against 4.14, and a user tries to run the prebuilt binaries against 4.16, the field access will be incorrect, since the kernel/userspace interface (ie, structure layout) changed. For what it's worth, Linux 4.14 LTS support ends in Jan 2020. But since Firecracker claims to support 4.14 onwards, there should be proper handling of kernel interface changes (probably at minimum across Linux LTS versions). Again - just my 2 cents - I obviously don't have any control over decisions :) |
@nbsdx You're right again, I'd missed that, it won't work. For now we resolved to restring handling to @acatangiu yup there's already signal handling functionality split between the
What do you think? |
747fd20
to
eaf30fd
Compare
@@ -144,8 +170,8 @@ mod tests { | |||
assert!(filter.apply().is_ok()); | |||
assert_eq!(METRICS.seccomp.num_faults.count(), 0); | |||
|
|||
// Calls the blacklisted SYS_getpid. | |||
let _pid = process::id(); | |||
// Call the blacklisted `SYS_mkdir`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR I allowed getpid
and kill
and changed the blackisted syscall from getpid
to mkdir
to make the GNU test pass consistently.
Long version:
- The glibc
SYS_getpid
caches its result, so it doesn't always do the syscall. Therefore,process::id
didn't always break the seccomp filter on GNU - depending on whetherSYS_getpid
had been called before. - I switched to:
That didn't work because glibc's
syscall(libc::SYS_getpid); // -> seccomp violation, handled ... raise(SIGBUS); // -> signal handler called
raise
calls bothgetpid
andgettid
(in some implementations - for instance, on the test machine it does, on mine it doesn't) sogetpid
had to be whitelisted. That's when I switched to whitelistinggetpid
andgettid
for GNU - but:- the code became overly complicated with
#[cfg]
s for this simple test; - the test still failed on the test machine - despite
gettid
being whitelisted, it still caused aSIGSYS
which, even more weirdly, did not get handled by the signal handler and terminated the test.
- the code became overly complicated with
- At this point I decided to eliminate
#[cfg]
s and workarounds and:- cause a
SIGSYS
withmkdir
(instead ofgetpid
); - emit a
SIGBUS
withkill
(instead ofraise
); - whitelist
kill
andgetpid
(for thekill
to go through).
This solution works on both GNU and musl.
- cause a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this clarification here:
whitelist kill and getpid (for the kill to go through).
is done just in a particular unit-test, not whitelisted in general.
634a0c5
to
140fc56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the overall look and documentation 👍
|
||
msg = 'Shutting down VM after intercepting signal {}'.format(signum) | ||
lines = log_fifo.sequential_reader(3) | ||
assert msg in lines[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it will always be on line 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm... no 😄 I modified the test to leave the logger some room to wriggle.
/// } | ||
/// ``` | ||
/// | ||
pub unsafe fn register_vcpu_signal_handler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return Err(io::Error::last_os_error()); | ||
// Other signals which might do async unsafe things incompatible with the rest of this | ||
// function are blocked due to the sa_mask used when registering the signal handler. | ||
let syscall = unsafe { *(info as *const i32).offset(SI_OFF_SYSCALL) as usize }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... magic? Is this the standard way of getting the signum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signum is fortunately always at the beginning of the sigaction
struct - the syscall number is trickier to get to because it's embedded in one of those unions in the struct. I found it easier to just navigate to its offset than to unpack the whole thing, like I (incompletely) tried previously.
// function are blocked due to the sa_mask used when registering the signal handler. | ||
let syscall = unsafe { *(info as *const i32).offset(SI_OFF_SYSCALL) as usize }; | ||
METRICS.seccomp.num_faults.inc(); | ||
error!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log stuff in signal handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. write
is safe, and the formatting doesn't use any of the unsafe sprintf
family.
@@ -144,8 +170,8 @@ mod tests { | |||
assert!(filter.apply().is_ok()); | |||
assert_eq!(METRICS.seccomp.num_faults.count(), 0); | |||
|
|||
// Calls the blacklisted SYS_getpid. | |||
let _pid = process::id(); | |||
// Call the blacklisted `SYS_mkdir`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this clarification here:
whitelist kill and getpid (for the kill to go through).
is done just in a particular unit-test, not whitelisted in general.
This test fails intermittently when linked against glibc. Glibc's getpid() caches the result, so it doesn't always do the syscall. To avoid GNU-related complications, the test now forces a different syscall (mkdir) - as getpid will be needed in a later commit. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
This step is in preparation to a refactoring that moves generic signal handling utilities to the sys_util crate, leaving only Firecracker specifics (read: custom signal handlers themselves) in vmm. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
* added a single function in vmm that installs all relevant signal handlers; * left signal handler functions in vmm (currently only for SIGSYS); * moved signal handling logic to sys_util; * split sys_util's register_signal_handler into 2 separate functions, one to be called exclusively for vCPUs and a second general purpose one. The vCPU-specific one doen't change the signal mask and alters the signal number, forcing it between SIGRTMIN and SIGRTMAX. The general purpose one is setup_sigsys_handler renamed, and will morph into a generic one in a following commit. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Renamed register_sigsys_handler to register_signal_handler, enabling the installation of custom signal handlers for any signal. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Log a message and exit with a specific exit code upon intercepting SIGBUS/SIGSEGV. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
The license checker now accepts files licensed in 2018 and 2019. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
The test checks that Firecracker logs the appropriate mesage when intercepting the signal. Exit code is not checked because, as the jailer clones into a new pid namespace, it's no longer in the test process' tree and its pid cannot be waited on. Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Added a custom signal handler that terminates the Firecracker process (with
libc::_exit
) when aSIGSEGV
orSIGBUS
is intercepted. See #1064 for details on the current handling of these signals.Issue #, if available: #1064
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.