-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
More optimizations for calling into WebAssembly (#2759)
* Combine stack-based cleanups for faster wasm calls This commit is an extension of #2757 where the goal is to optimize entry into WebAssembly. Currently wasmtime has two stack-based cleanups when entering wasm, one for the externref activation table and another for stack limits getting reset. This commit fuses these two cleanups together into one and moves some code around which enables less captures for fewer closures and such to speed up calls in to wasm a bit more. Overall this drops the execution time from 88ns to 80ns locally for me. This also updates the atomic orderings when updating the stack limit from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting point the usage here should be safe to use `Relaxed` since we're not using the atomics to actually protect any memory, it's simply receiving signals from other threads. * Determine whether a pc is wasm via a global map The macOS implementation of traps recently changed to using mach ports for handlers instead of signal handlers. This means that a previously relied upon invariant, each thread fixes its own trap, was broken. The macOS implementation worked around this by maintaining a global map from thread id to thread local information, however, to solve the problem. This global map is quite slow though. It involves taking a lock and updating a hash map on all calls into WebAssembly. In my local testing this accounts for >70% of the overhead of calling into WebAssembly on macOS. Naturally it'd be great to remove this! This commit fixes this issue and removes the global lock/map that is updated on all calls into WebAssembly. The fix is to maintain a global map of wasm modules and their trap addresses in the `wasmtime` crate. Doing so is relatively simple since we're already tracking this information at the `Store` level. Once we've got a global map then the macOS implementation can use this from a foreign thread and everything works out. Locally this brings the overhead, on macOS specifically, of calling into wasm from 80ns to ~20ns. * Fix compiles * Review comments
- Loading branch information
1 parent
6b2da3d
commit d4b54ee
Showing
13 changed files
with
323 additions
and
308 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.