Implement limiting WebAssembly execution with fuel #2611

alexcrichton · 2021-01-27T22:32:56Z

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

Config::consume_fuel - enables codegen options for wasm to consume fuel, and behaves similar to interruptable.
Store::set_fuel_remaining - this is how fuel is injected into a Store for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.
Store::fuel_consumed - this can be used to check how much fuel has been consumed so far.

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

cfallin · 2021-01-27T22:44:24Z

I'll take a look -- thanks for putting this together so quickly! A few quick notes on the isel thoughts:

For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well

The former is an RMW compound op and could be done pretty reasonably with the current load-op merging framework; we haven't gotten around to it yet, but it's clear how to do so if we decide it's important. The latter is actually a bit more interesting: I recently fixed a bug (#2576) that arose because of certain combinations of load-and-compare that would sink a load beyond its first use by simply disallowing compare-from-memory. That too can be re-optimized but would take more machinery.

Anyway, all that to say that there are reasons you couldn't get either of those instructions to come out the backend :-)

alexcrichton · 2021-01-27T22:53:31Z

Oh also to clarify, I don't think it's just isel that can be improved, but rather I think the frontend codegen could also be improved if we decide the time should be invested. For example an empty function (func nop) generates this x86 code with the new backend (annotated by me):

0000000000000000 _wasm_function_4:
       ;; prologue stack-check to ensure we're not out of stack space
       ;; note that this is only here because function calls are made internally
       0: 55                           	pushq	%rbp
       1: 48 89 e5                     	movq	%rsp, %rbp
       4: 4c 8b 17                     	movq	(%rdi), %r10
       7: 4d 8b 12                     	movq	(%r10), %r10
       a: 49 39 e2                     	cmpq	%rsp, %r10
       d: 0f 86 02 00 00 00            	jbe	2 <_wasm_function_4+0x15>
      13: 0f 0b                        	ud2

      ;; allocating stack space, saving registers
      15: 48 83 ec 10                  	subq	$16, %rsp
      19: 4c 89 24 24                  	movq	%r12, (%rsp)

      ;; load vminterrupts_ptr into r12
      1d: 4c 8b 27                     	movq	(%rdi), %r12

      ;; increment the fuel consumed by 1 where 8(%r12) is where the previous count is
      20: be 01 00 00 00               	movl	$1, %esi
      25: 49 03 74 24 08               	addq	8(%r12), %rsi

      ;; check if fuel is gerater than zero
      2a: 48 31 c0                     	xorq	%rax, %rax
      2d: 48 39 c6                     	cmpq	%rax, %rsi
      30: 0f 8c 13 00 00 00            	jl	19 <_wasm_function_4+0x49>

      ;; save the fuel count, load the intrinsic address, call it
      36: 49 89 74 24 08               	movq	%rsi, 8(%r12)
      3b: 48 8b b7 f0 06 00 00         	movq	1776(%rdi), %rsi
      42: ff d6                        	callq	*%rsi

      ;; restore the fuel_var for this function (unnecessary instruction, happens uncondititonally after call)
      44: 49 8b 74 24 08               	movq	8(%r12), %rsi
    
      ;; save fuel_var at the end of the function (unnecessary instruction, happens unconditionally at function end)
      49: 49 89 74 24 08               	movq	%rsi, 8(%r12)

      ;; function epilogue
      4e: 4c 8b 24 24                  	movq	(%rsp), %r12
      52: 48 83 c4 10                  	addq	$16, %rsp
      56: 48 89 ec                     	movq	%rbp, %rsp
      59: 5d                           	popq	%rbp
      5a: c3                           	retq

Larger functions like https://github.com/bytecodealliance/sightglass/blob/faf101e50f7af6243fe3e910cf9455eecd9e1368/benchmarks-next/shootout-ackermann/benchmark.wat#L25-L49 end up having a lot more code generated with instrumentation enabled. I haven't yet gone through instruction-by-instruction yet, though, to see what could be improved.

github-actions · 2021-01-27T23:54:39Z

Subscribe to Label Action

cc @fitzgen, @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "fuzzing", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing
peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

crates/cranelift/src/func_environ.rs

cfallin

Looks very nice overall!

My main question below has to do with the way in which we actually handle out-of-fuel conditions, and whether this implementation as-is allows us to resume to do timeslicing. It's possible that more will come later, in which case, I'm perfectly happy to see this landed now; just want to understand how things will (eventually) work.

cfallin · 2021-01-27T23:53:56Z

crates/wasmtime/src/store.rs

+    pub fn set_fuel_remaining(&self, remaining: u64) {
+        assert!(self.engine().config().tunables.consume_fuel);
+        let remaining = i64::try_from(remaining).unwrap();
+        self.inner.fuel_adj.set(remaining);


Is there any reason we can't add to the existing fuel_adj value here, in order to continue accumulating the consumed-fuel count and return the true total from fuel_consumed()?

(In that case I might also call this add_fuel(), and adjust existing rather than overwrite the fuel-consumed counter...)

I was a tiny bit worried that a long-lived store might overflow the i64 counter but that may not be too realistic. Do you think that'd be rare enough that we should just switch this to an add instead of a set?

IMHO it's nicer to not have the "fuel-used value that we return is only since the last set" property -- it has the potential to become a subtle stats bug later.

Doing some quick math, a 2^63 max count, at 1B Wasm ops per second, gives us 2^33 or 8B seconds of runtime before overflow, which is ~250 years. Sometime before the year 2270 we can come back and upgrade to an i128 :-)

Heh good point!

crates/wasmtime/src/store.rs

This commit adds codegen infrastructure necessary to instrument wasm code to consume fuel as it executes. Currently nothing is really done with the fuel, but that'll come in later commits. The focus of this commit is to implement the codegen infrastructure necessary to consume fuel and account for fuel consumed correctly.

This commit enables wasm code to periodically check to see if fuel has run out. When fuel runs out an intrinsic is called which can do what it needs to do in the result of fuel running out. For now a trap is thrown to have at least some semantics in synchronous stores, but another planned use for this feature is for asynchronous stores to periodically yield back to the host based on fuel running out. Checks for remaining fuel happen in the same locations as interrupt checks, which is to say the start of the function as well as loop headers.

The location of the shared interrupt value and fuel value is through a double-indirection on the vmctx (load through the vmctx and then load through that pointer). The second pointer in this chain, however, never changes, so we can alter codegen to account for this and remove some extraneous load instructions and hopefully reduce some register pressure even maybe.

Use fuel to time out modules in addition to time, using fuzz input to figure out which.

alexcrichton · 2021-01-28T21:47:01Z

Ok this should be rebased and updated to have add_fuel

cfallin

LGTM!

alexcrichton requested a review from cfallin January 27, 2021 22:34

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:wasm fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself labels Jan 27, 2021

fitzgen reviewed Jan 28, 2021

View reviewed changes

crates/cranelift/src/func_environ.rs Show resolved Hide resolved

cfallin reviewed Jan 28, 2021

View reviewed changes

alexcrichton added 9 commits January 28, 2021 13:25

Add tests fuel can abort infinite loops

e07963a

More fuzzing with fuel

72a2b79

Use fuel to time out modules in addition to time, using fuzz input to figure out which.

Update docs on trapping instructions

66b2cb0

Fix doc links

cb9e8c1

Fix a fuzz test

f078778

Change setting fuel to adding fuel

b892ff9

alexcrichton force-pushed the fuel branch from 2b8ee53 to b892ff9 Compare January 28, 2021 21:46

cfallin approved these changes Jan 28, 2021

View reviewed changes

alexcrichton added 2 commits January 28, 2021 14:08

Fix a doc link

a8a2ad6

Squelch some rustdoc warnings

c4d7f38

alexcrichton merged commit 0e41861 into bytecodealliance:main Jan 29, 2021

alexcrichton deleted the fuel branch January 29, 2021 14:57

ShuP1 mentioned this pull request Feb 6, 2021

Add fuel related functions to c-api #2643

Merged

ryanpbrewster mentioned this pull request May 2, 2021

Add example of execution limits using fuel consumption #2869

Merged

codefromthecrypt mentioned this pull request Mar 31, 2022

Add option to limit CPU usage tetratelabs/wazero#422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement limiting WebAssembly execution with fuel #2611

Implement limiting WebAssembly execution with fuel #2611

alexcrichton commented Jan 27, 2021

cfallin commented Jan 27, 2021

alexcrichton commented Jan 27, 2021

github-actions bot commented Jan 27, 2021

cfallin left a comment

cfallin Jan 27, 2021

alexcrichton Jan 28, 2021

cfallin Jan 28, 2021

alexcrichton Jan 28, 2021

alexcrichton commented Jan 28, 2021

cfallin left a comment

Implement limiting WebAssembly execution with fuel #2611

Implement limiting WebAssembly execution with fuel #2611

Conversation

alexcrichton commented Jan 27, 2021

cfallin commented Jan 27, 2021

alexcrichton commented Jan 27, 2021

github-actions bot commented Jan 27, 2021

Subscribe to Label Action

cfallin left a comment

Choose a reason for hiding this comment

cfallin Jan 27, 2021

Choose a reason for hiding this comment

alexcrichton Jan 28, 2021

Choose a reason for hiding this comment

cfallin Jan 28, 2021

Choose a reason for hiding this comment

alexcrichton Jan 28, 2021

Choose a reason for hiding this comment

alexcrichton commented Jan 28, 2021

cfallin left a comment

Choose a reason for hiding this comment