-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate faulting addresses are valid to fault on #6028
Validate faulting addresses are valid to fault on #6028
Conversation
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Subscribe to Label Actioncc @fitzgen
This issue or pull request has been labeled: "fuzzing"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable; most of my comments are nits/requests for clarified comments. Overall this is a very nice backstop to ensure we are more likely to catch codegen bugs; so, thanks!
A high-level question as well: a pagefault from within a hostcall won't trigger the warning, right? Or do we enter the handler if an instance in the store is on the stack at all?
// | ||
// In the original Gecko sources this was copied from the structure here | ||
// is `pragma(pack(4))` which should be the same as an unaligned i64 | ||
// here which doesn't increase the overall alignment of the structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link the Gecko source (e.g. on searchfox) for reference?
Is repr(packed)
equivalent to align=4 here because the current struct offset is already 4-aligned? If so, could we say this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repr(packed)
reduces alignment to 1 I think, but it turns out that Rust supports repr(packed(4))
which exactly matches what C does so I've switched to that instead of the packed
newtype wrapper.
return None; | ||
} | ||
let mut fault = None; | ||
for instance in self.instances.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we aren't worried about linear-time behavior here because this is only within a store, not across all live instances in the engine; and way may have some number of instances (due to a component instantiation perhaps) but not an extremely large number. Can we note that here? And, are there circumstances in the future where this may be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the rough idea, that the total number of linear memories and instances is likely quite small per-store. I'm also sort of banking on everything being of the mindset "well it's ok if traps are a bit slower". If this becomes an issue though I think we can push more maps outward into the store to more efficiently (probably log(n) style) calculate which instance/memory a linear memory address belongs to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only traps that are semi-hot are wasi exits, which don't have faulting addresses and so should never enter this function anyways, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right yeah wasm exits won't execute this. I don't know of any where the loop would be hot other than synthetic "instantiate 10000 things and then repeatedly trap", but I'll definitely bolster the comments here.
Correct, yeah. We've actually got tests which ensure that a host-triggered segfault isn't caught by Wasmtime and aborts the process (as you'd expect a segfault in a language like Rust to do so). The first thing the signal handlers do is test whether the faulting pc is a wasm pc, and it defers to the default behavior of each signal if it's not a wasm pc. If it's a wasm pc though we fully catch the trap assuming it's expected that the wasm instruction had the kind of fault. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo nitpicks that Chris pointed out and my tiny suggestion for the fatal error message, this looks great! Really excited to have this extra backstop in place, and getting better trap error messages on top of that is icing on the cake. Well done!
This commit adds a defense-in-depth measure to Wasmtime which is intended to mitigate the impact of CVEs such as GHSA-ff4p-7xrq-q5r8. Currently Wasmtime will catch `SIGSEGV` signals for WebAssembly code so long as the instruction which faulted is an allow-listed instruction (aka has a trap code listed for it). With the recent security issue, however, the problem was that a wasm guest could exploit a compiler bug to access memory outside of its sandbox. If the access was successful there's no real way to detect that, but if the access was unsuccessful then Wasmtime would happily swallow the `SIGSEGV` and report a nominal trap. To embedders, this might look like nothing is going awry. The new strategy implemented here in this commit is to attempt to be more robust towards these sorts of failures. When a `SIGSEGV` is raised the faulting pc is recorded but additionally the address of the inaccessible location is also record. After the WebAssembly stack is unwound and control returns to Wasmtime which has access to a `Store` Wasmtime will now use this inaccessible faulting address to translate it to a wasm address. This process should be guaranteed to succeed as WebAssembly should only be able to access a well-defined region of memory for all linear memories in a `Store`. If no linear memory in a `Store` could contain the faulting address, then Wasmtime now prints a scary message and aborts the process. The purpose of this is to catch these sorts of bugs, make them very loud errors, and hopefully mitigate impact. This would continue to not mitigate the impact of a guest successfully loading data outside of its sandbox, but if a guest was doing a sort of probing strategy trying to find valid addresses then any invalid access would turn into a process crash which would immediately be noticed by embedders. While I was here I went ahead and additionally took a stab at bytecodealliance#3120. Traps due to `SIGSEGV` will now report the size of linear memory and the address that was being accessed in addition to the bland "access out of bounds" error. While this is still somewhat bland in the context of a high level source language it's hopefully at least a little bit more actionable for some. I'll note though that this isn't a guaranteed contextual message since only the default configuration for Wasmtime generates `SIGSEGV` on out-of-bounds memory accesses. Dynamically bounds-checked configurations, for example, don't do this. Testing-wise I unfortunately am not aware of a great way to test this. The closet equivalent would be something like an `unsafe` method `Config::allow_wasm_sandbox_escape`. In lieu of adding tests, though, I can confirm that during development the crashing messages works just fine as it took awhile on macOS to figure out where the faulting address was recorded in the exception information which meant I had lots of instances of recording an address of a trap not accessible from wasm.
a02c62f
to
5ef712a
Compare
@uweigand I was wondering if I could enlist your help to debug a failure here. This has failed in CI with this error:
where this is the module being executed: (module
(memory 1)
(func $start
i32.const 0xdeadbeef
i32.load
drop)
(start $start)
) and the test is asserting: let err = format!("{err:?}");
assert!(
err.contains("memory fault at wasm address 0xdeadbeef in linear memory of size 0x10000"),
"bad error: {err}"
); so it seems like s390x is reporting the faulting address of the load instruction is something page-rounded (or something like that?). I don't know how to hook up gdb to qemu but in theory this means that the address coming out of Do you know if this is an s390x-specific behavior? Are faulting addresses reported at their page aligned boundary instead of the address itself? |
Yes, this is correct. Faulting addresses are rounded to their 4k page boundary on s390x. |
Ok good to know it's expected! I'll disable this particular test for s390x in that case. |
s390x rounds faulting addresses to 4k boundaries.
This commit adds a defense-in-depth measure to Wasmtime which is intended to mitigate the impact of CVEs such as GHSA-ff4p-7xrq-q5r8. Currently Wasmtime will catch
SIGSEGV
signals for WebAssembly code so long as the instruction which faulted is an allow-listed instruction (aka has a trap code listed for it). With the recent security issue, however, the problem was that a wasm guest could exploit a compiler bug to access memory outside of its sandbox. If the access was successful there's no real way to detect that, but if the access was unsuccessful then Wasmtime would happily swallow theSIGSEGV
and report a nominal trap. To embedders, this might look like nothing is going awry.The new strategy implemented here in this commit is to attempt to be more robust towards these sorts of failures. When a
SIGSEGV
is raised the faulting pc is recorded but additionally the address of the inaccessible location is also record. After the WebAssembly stack is unwound and control returns to Wasmtime which has access to aStore
Wasmtime will now use this inaccessible faulting address to translate it to a wasm address. This process should be guaranteed to succeed as WebAssembly should only be able to access a well-defined region of memory for all linear memories in aStore
.If no linear memory in a
Store
could contain the faulting address, then Wasmtime now prints a scary message and aborts the process. The purpose of this is to catch these sorts of bugs, make them very loud errors, and hopefully mitigate impact. This would continue to not mitigate the impact of a guest successfully loading data outside of its sandbox, but if a guest was doing a sort of probing strategy trying to find valid addresses then any invalid access would turn into a process crash which would immediately be noticed by embedders.While I was here I went ahead and additionally took a stab at #3120. Traps due to
SIGSEGV
will now report the size of linear memory and the address that was being accessed in addition to the bland "access out of bounds" error. While this is still somewhat bland in the context of a high level source language it's hopefully at least a little bit more actionable for some. I'll note though that this isn't a guaranteed contextual message since only the default configuration for Wasmtime generatesSIGSEGV
on out-of-bounds memory accesses. Dynamically bounds-checked configurations, for example, don't do this.Testing-wise I unfortunately am not aware of a great way to test this. The closet equivalent would be something like an
unsafe
methodConfig::allow_wasm_sandbox_escape
. In lieu of adding tests, though, I can confirm that during development the crashing messages works just fine as it took awhile on macOS to figure out where the faulting address was recorded in the exception information which meant I had lots of instances of recording an address of a trap not accessible from wasm.