Validate faulting addresses are valid to fault on #6028

alexcrichton · 2023-03-15T20:58:56Z

This commit adds a defense-in-depth measure to Wasmtime which is intended to mitigate the impact of CVEs such as GHSA-ff4p-7xrq-q5r8. Currently Wasmtime will catch SIGSEGV signals for WebAssembly code so long as the instruction which faulted is an allow-listed instruction (aka has a trap code listed for it). With the recent security issue, however, the problem was that a wasm guest could exploit a compiler bug to access memory outside of its sandbox. If the access was successful there's no real way to detect that, but if the access was unsuccessful then Wasmtime would happily swallow the SIGSEGV and report a nominal trap. To embedders, this might look like nothing is going awry.

The new strategy implemented here in this commit is to attempt to be more robust towards these sorts of failures. When a SIGSEGV is raised the faulting pc is recorded but additionally the address of the inaccessible location is also record. After the WebAssembly stack is unwound and control returns to Wasmtime which has access to a Store Wasmtime will now use this inaccessible faulting address to translate it to a wasm address. This process should be guaranteed to succeed as WebAssembly should only be able to access a well-defined region of memory for all linear memories in a Store.

If no linear memory in a Store could contain the faulting address, then Wasmtime now prints a scary message and aborts the process. The purpose of this is to catch these sorts of bugs, make them very loud errors, and hopefully mitigate impact. This would continue to not mitigate the impact of a guest successfully loading data outside of its sandbox, but if a guest was doing a sort of probing strategy trying to find valid addresses then any invalid access would turn into a process crash which would immediately be noticed by embedders.

While I was here I went ahead and additionally took a stab at #3120. Traps due to SIGSEGV will now report the size of linear memory and the address that was being accessed in addition to the bland "access out of bounds" error. While this is still somewhat bland in the context of a high level source language it's hopefully at least a little bit more actionable for some. I'll note though that this isn't a guaranteed contextual message since only the default configuration for Wasmtime generates SIGSEGV on out-of-bounds memory accesses. Dynamically bounds-checked configurations, for example, don't do this.

Testing-wise I unfortunately am not aware of a great way to test this. The closet equivalent would be something like an unsafe method Config::allow_wasm_sandbox_escape. In lieu of adding tests, though, I can confirm that during development the crashing messages works just fine as it took awhile on macOS to figure out where the faulting address was recorded in the exception information which meant I had lots of instances of recording an address of a trap not accessible from wasm.

github-actions · 2023-03-15T21:45:03Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

github-actions · 2023-03-15T23:44:49Z

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cfallin

This looks reasonable; most of my comments are nits/requests for clarified comments. Overall this is a very nice backstop to ensure we are more likely to catch codegen bugs; so, thanks!

A high-level question as well: a pagefault from within a hostcall won't trigger the warning, right? Or do we enter the handler if an instance in the store is on the stack at all?

crates/runtime/src/lib.rs

crates/runtime/src/memory.rs

crates/runtime/src/traphandlers.rs

cfallin · 2023-03-15T23:53:00Z

crates/runtime/src/traphandlers/macos.rs

+        //
+        // In the original Gecko sources this was copied from the structure here
+        // is `pragma(pack(4))` which should be the same as an unaligned i64
+        // here which doesn't increase the overall alignment of the structure.


Can we link the Gecko source (e.g. on searchfox) for reference?

Is repr(packed) equivalent to align=4 here because the current struct offset is already 4-aligned? If so, could we say this?

repr(packed) reduces alignment to 1 I think, but it turns out that Rust supports repr(packed(4)) which exactly matches what C does so I've switched to that instead of the packed newtype wrapper.

crates/wasmtime/src/store.rs

cfallin · 2023-03-15T23:57:54Z

crates/wasmtime/src/store.rs

+            return None;
+        }
+        let mut fault = None;
+        for instance in self.instances.iter() {


I suppose we aren't worried about linear-time behavior here because this is only within a store, not across all live instances in the engine; and way may have some number of instances (due to a component instantiation perhaps) but not an extremely large number. Can we note that here? And, are there circumstances in the future where this may be a problem?

Yeah that's the rough idea, that the total number of linear memories and instances is likely quite small per-store. I'm also sort of banking on everything being of the mindset "well it's ok if traps are a bit slower". If this becomes an issue though I think we can push more maps outward into the store to more efficiently (probably log(n) style) calculate which instance/memory a linear memory address belongs to.

The only traps that are semi-hot are wasi exits, which don't have faulting addresses and so should never enter this function anyways, right?

Right yeah wasm exits won't execute this. I don't know of any where the loop would be hot other than synthetic "instantiate 10000 things and then repeatedly trap", but I'll definitely bolster the comments here.

alexcrichton · 2023-03-16T14:42:57Z

A high-level question as well: a pagefault from within a hostcall won't trigger the warning, right? Or do we enter the handler if an instance in the store is on the stack at all?

Correct, yeah. We've actually got tests which ensure that a host-triggered segfault isn't caught by Wasmtime and aborts the process (as you'd expect a segfault in a language like Rust to do so). The first thing the signal handlers do is test whether the faulting pc is a wasm pc, and it defers to the default behavior of each signal if it's not a wasm pc. If it's a wasm pc though we fully catch the trap assuming it's expected that the wasm instruction had the kind of fault.

crates/wasmtime/src/store.rs

fitzgen

Modulo nitpicks that Chris pointed out and my tiny suggestion for the fatal error message, this looks great! Really excited to have this extra backstop in place, and getting better trap error messages on top of that is icing on the cake. Well done!

This commit adds a defense-in-depth measure to Wasmtime which is intended to mitigate the impact of CVEs such as GHSA-ff4p-7xrq-q5r8. Currently Wasmtime will catch `SIGSEGV` signals for WebAssembly code so long as the instruction which faulted is an allow-listed instruction (aka has a trap code listed for it). With the recent security issue, however, the problem was that a wasm guest could exploit a compiler bug to access memory outside of its sandbox. If the access was successful there's no real way to detect that, but if the access was unsuccessful then Wasmtime would happily swallow the `SIGSEGV` and report a nominal trap. To embedders, this might look like nothing is going awry. The new strategy implemented here in this commit is to attempt to be more robust towards these sorts of failures. When a `SIGSEGV` is raised the faulting pc is recorded but additionally the address of the inaccessible location is also record. After the WebAssembly stack is unwound and control returns to Wasmtime which has access to a `Store` Wasmtime will now use this inaccessible faulting address to translate it to a wasm address. This process should be guaranteed to succeed as WebAssembly should only be able to access a well-defined region of memory for all linear memories in a `Store`. If no linear memory in a `Store` could contain the faulting address, then Wasmtime now prints a scary message and aborts the process. The purpose of this is to catch these sorts of bugs, make them very loud errors, and hopefully mitigate impact. This would continue to not mitigate the impact of a guest successfully loading data outside of its sandbox, but if a guest was doing a sort of probing strategy trying to find valid addresses then any invalid access would turn into a process crash which would immediately be noticed by embedders. While I was here I went ahead and additionally took a stab at bytecodealliance#3120. Traps due to `SIGSEGV` will now report the size of linear memory and the address that was being accessed in addition to the bland "access out of bounds" error. While this is still somewhat bland in the context of a high level source language it's hopefully at least a little bit more actionable for some. I'll note though that this isn't a guaranteed contextual message since only the default configuration for Wasmtime generates `SIGSEGV` on out-of-bounds memory accesses. Dynamically bounds-checked configurations, for example, don't do this. Testing-wise I unfortunately am not aware of a great way to test this. The closet equivalent would be something like an `unsafe` method `Config::allow_wasm_sandbox_escape`. In lieu of adding tests, though, I can confirm that during development the crashing messages works just fine as it took awhile on macOS to figure out where the faulting address was recorded in the exception information which meant I had lots of instances of recording an address of a trap not accessible from wasm.

alexcrichton · 2023-03-16T21:44:50Z

@uweigand I was wondering if I could enlist your help to debug a failure here. This has failed in CI with this error:

---- traps::wasm_fault_address_reported_by_default stdout ----
thread 'traps::wasm_fault_address_reported_by_default' panicked at 'bad error: error while executing at wasm backtrace:
    0:   0x25 - <unknown>!start

Caused by:
    0: memory fault at wasm address 0xdeadb000 in linear memory of size 0x10000
    1: wasm trap: out of bounds memory access', tests/all/traps.rs:1308:5

where this is the module being executed:

            (module
                (memory 1)
                (func $start
                    i32.const 0xdeadbeef
                    i32.load
                    drop)
                (start $start)
            )

and the test is asserting:

    let err = format!("{err:?}");
    assert!(
        err.contains("memory fault at wasm address 0xdeadbeef in linear memory of size 0x10000"),
        "bad error: {err}"
    );

so it seems like s390x is reporting the faulting address of the load instruction is something page-rounded (or something like that?). I don't know how to hook up gdb to qemu but in theory this means that the address coming out of siginfo_t's si_addr field in the signal handler doesn't have the 0xeef at the end and instead ends in 0x000 (I think).

Do you know if this is an s390x-specific behavior? Are faulting addresses reported at their page aligned boundary instead of the address itself?

uweigand · 2023-03-17T13:27:47Z

Do you know if this is an s390x-specific behavior? Are faulting addresses reported at their page aligned boundary instead of the address itself?

Yes, this is correct. Faulting addresses are rounded to their 4k page boundary on s390x.

alexcrichton · 2023-03-17T14:28:32Z

Ok good to know it's expected! I'll disable this particular test for s390x in that case.

s390x rounds faulting addresses to 4k boundaries.

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Mar 15, 2023

github-actions bot added the fuzzing Issues related to our fuzzing infrastructure label Mar 15, 2023

cfallin approved these changes Mar 16, 2023

View reviewed changes

fitzgen reviewed Mar 16, 2023

View reviewed changes

crates/wasmtime/src/store.rs Outdated Show resolved Hide resolved

fitzgen approved these changes Mar 16, 2023

View reviewed changes

alexcrichton added 3 commits March 16, 2023 11:30

Fix tests

8f14867

Review comments

5ef712a

alexcrichton force-pushed the validate-segfault branch from a02c62f to 5ef712a Compare March 16, 2023 18:42

alexcrichton enabled auto-merge March 16, 2023 18:42

Fix compile after refactor

8195dbe

alexcrichton added this pull request to the merge queue Mar 16, 2023

Fix compile on macOS

fa6035a

alexcrichton removed this pull request from the merge queue due to a manual request Mar 16, 2023

alexcrichton enabled auto-merge March 16, 2023 20:27

alexcrichton added this pull request to the merge queue Mar 16, 2023

alexcrichton removed this pull request from the merge queue due to a manual request Mar 16, 2023

Fix trap test for s390x

ed52558

s390x rounds faulting addresses to 4k boundaries.

alexcrichton enabled auto-merge March 17, 2023 14:32

alexcrichton added this pull request to the merge queue Mar 17, 2023

alexcrichton merged commit 28371bf into bytecodealliance:main Mar 17, 2023

alexcrichton deleted the validate-segfault branch March 17, 2023 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate faulting addresses are valid to fault on #6028

Validate faulting addresses are valid to fault on #6028

alexcrichton commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

cfallin left a comment

cfallin Mar 15, 2023

alexcrichton Mar 16, 2023

cfallin Mar 15, 2023

alexcrichton Mar 16, 2023

fitzgen Mar 16, 2023

alexcrichton Mar 16, 2023

alexcrichton commented Mar 16, 2023

fitzgen left a comment

alexcrichton commented Mar 16, 2023

uweigand commented Mar 17, 2023

alexcrichton commented Mar 17, 2023

Validate faulting addresses are valid to fault on #6028

Validate faulting addresses are valid to fault on #6028

Conversation

alexcrichton commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

Subscribe to Label Action

github-actions bot commented Mar 15, 2023

Subscribe to Label Action

cfallin left a comment

Choose a reason for hiding this comment

cfallin Mar 15, 2023

Choose a reason for hiding this comment

alexcrichton Mar 16, 2023

Choose a reason for hiding this comment

cfallin Mar 15, 2023

Choose a reason for hiding this comment

alexcrichton Mar 16, 2023

Choose a reason for hiding this comment

fitzgen Mar 16, 2023

Choose a reason for hiding this comment

alexcrichton Mar 16, 2023

Choose a reason for hiding this comment

alexcrichton commented Mar 16, 2023

fitzgen left a comment

Choose a reason for hiding this comment

alexcrichton commented Mar 16, 2023

uweigand commented Mar 17, 2023

alexcrichton commented Mar 17, 2023