-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasi-parallel: implement CPU parallelism #4949
base: main
Are you sure you want to change the base?
Conversation
51a11e1
to
2e68353
Compare
For certain WITX interfaces, it may be helpful to implement parts of the exposed API using raw code instead of Wiggle-generated code. This change adds a new macro configuration option, `skip: ["<func name>", ...]`, that tells Wiggle to ignore the given function names completely when generating code.
`wiggle` looks for an exported `Memory` named `"memory"` to use for its guest slices. This change allows it to use a `SharedMemory` if this is the kind of memory used for the export.
4e9fb2c
to
8fe3cae
Compare
This change implements the `cpu` and `sequential` devices from `wasi-parallel`. In both cases, the WebAssembly-encoded bytes of a kernel module are passed to the `parallel_exec` function which instantiates the kernel, imports the shared module exported by the host module, and invokes the exported `kernel` function of the kernel the desired number of times. The key difference between `cpu` and `sequential` is that `sequential` performs the invocations sequentially, in a `for` loop, whereas `cpu` performs each invocation in a separate thread of a pre-initialized thread pool (sized to the number of cores on the system).
These tests exercise the `wasi-parallel` API in basic ways. In order to simplify the setup of such tests, a `TestCase` structure does the necessary configuration. Unfortunately, toolchain support for emitting `wasi-parallel` programs is lacking: both the Rust and C (`wasi-sdk`) toolchains have trouble emitting shared memory programs and neither has any simple way to compile a kernel file separately and include it in another compilation step. Because of this the tests here are (mostly) implemented as hand-written WAT; if toolchain support improves, these should be replaced.
In keeping with the theme from the "testing" commit, this change adds benchmarks for `wasi-parallel` based on hand-written WAT files. This reuses some of the same WAT files because of the difficulty of crafting them.
This change updates the Wasmtime CLI application with the necessary flags for running `wasi-parallel` programs.
Since it is not clear to me yet who can sign off on vetted crates, here I exempt several crates, mainly due to the upgraded version of `criterion` used by `wasmtime-wasi-parallel`. This is very much a TODO commit--please feel free to tell me what to do here instead of this!
8fe3cae
to
0524ddd
Compare
I think it is important to note that this PR does not depend on toolchain - implementing an API in consumer should not require a particular producer to be on board; multiple memories are a a different example of this. Also keep in mind that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two minor comments below, but otherwise I am not certain that we'll want to land this in Wasmtime at this time. My main rationale for this is that wiggle is more-or-less "on its way out" with our transition to the component model and wit-bindgen. In the component model the hack you've implemented here with getting the shared memory to work will not work and has no way of working, there's no escape hatch to implement as you've done here.
At this time there's no concrete timeline of when wit-bindgen will become "official" or when we'll switch over to it. Additionally the existing wasi-common
, wasi-nn
, and wasi-crypto
will likely all need to get removed or refactored or something with a transition away from wiggle
. Despite all that though all these prior interfaces have what's at least a path forward to move into the component model rather than the wiggle
-level of abstraction it has today. That's in contrast with this WASI proposal which I don't think has any path forward into the component model.
I'd be curious to see how others feel about this though. I am not sure how useful it's been to have wasi-nn
and wasi-crypto
in this repository, I am not aware of how much usage they're getting and feedback they're providing to the upstream proposals. In that sense I'm not sure how beneficial it will be to land this here vs have it externally.
Overall I personally feel that the multithreading story has a lot of baking to still do and while I'd like to see it get its baking time I'm not sure if it's this repository is the right place for that right now at this time.
keywords = ["webassembly", "wasm", "parallel"] | ||
repository = "https://github.com/bytecodealliance/wasmtime" | ||
readme = "README.md" | ||
edition = "2018" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind bumping this to 2021 with edition.workspace = true
?
@@ -42,6 +42,10 @@ criteria = "safe-to-deploy" | |||
version = "0.0.1" | |||
criteria = "safe-to-deploy" | |||
|
|||
[[exemptions.anes]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm wary of continuing to take on large unvetted dependencies, even through the optional wasi proposals. The previous wasi proposals we have implemented were largely grandfathered in but I would personally prefer to take this opportunity to not continue to try to grandfather in new features.
Is it possible to slim down the proposal's dependency tree to not rqeuire so many exemptions? Or otherwise are these sorts of crates absolutely required? If so I would personally prefer to have an official vet on-the-record for the inclusion here.
Probably worth re-iterating the point @abrown made above, that this would block potential implementation of |
That's true, yes, and I feel pretty similarly to the |
Just out of curiosity, as I haven't been following wasi meetings very closely, what is the background of the requirement that wasi interfaces only use component model? And does this only apply to wasmtime? |
I, too, don't follow wasi meetings closely so I won't pretend to speak authoritatively on this, but I believe that the intention is to define WASI APIs with I'm not sure what you mean though by if this only applies to wasmtime. The component model should look the same irrespective of where it's implemented, so it's not a wasmtime-specific limitation that If you're instead asking if |
My understanding of how wasi-parallel is expected to fit into the component model is that it could be packaged as an import of a core-module, with the core-module importing the linear memory and exporting functions that operate on it directly. That isn't supported in wit or wit-bindgen today, so we'll need interim approaches, but eventually we should be able to pull these into the wit framework. |
@alexcrichton thank you for the detailed answers.
I meant to ask whether this is the approach adopted specifically by wasmtime or by component model, but you have answered that.
Does component model allow this yet? What can the interim approaches be? Maybe we should take it the component-model repo though. |
The component model does support importing a core wasm module, but that's not reflected in the |
@alexcrichton, I think there are several points you should reconsider:
Then there's a bug in the component model specification. If there is no way to support multiple WASI proposals (
Sure, but that should not block progress. The issue you point out is not with those WASI contributions but rather with the unstable state of
You should talk to @geekbeast, also a Fastly employee. He was attempting to use wasi-nn in Viceroy and was able to do so based on the reference implementation here.
I am not involved in component model design... but you are 😁. This PR — along with WebAssembly/wasi-parallel#4, bytecodealliance/wit-bindgen#130, bytecodealliance/wit-bindgen#313 — should be great feedback for a component model issue that you would know how to craft but I would not. I don't understand all of the component model requirements, history, etc. so my ability to contribute there is limited. Why not just take this PR as feedback to the component model and propose a solution so that these WASI proposals can have a future? |
Agreed with @sunfishcode above that the right way to support various kinds of shared-memory multi-threading in WASI in a component-model world is by expressing the imported threading functionality as core module imports, which are then instantiated by the component to explicitly share core memory with the other core modules inside the component. When core wasm eventually gets a This does raise the question of how to write the interface of a core module in Wit, since there currently isn't any syntactic support for this. But it's easy to imagine a Wit-like syntax for I don't have enough understanding (particularly around Wiggle) to say what we should do in the right-now term before the above Wit functionality gets implemented. But switching out Wiggle for Wit does seem like it should be our goal state. |
Thanks @sunfishcode and @lukewagner for pointing out that the shared memory import/export will be expressible with WIT core modules. I still have questions on this (When and how can the core module syntax be implemented? Is help needed there and, if so, what? Should that block this PR?) but it seems like long term there should not be a problem with But, to take up what I think is @alexcrichton's point, the issue is actually about host access. Proposals like this one and I actually see several possibilities along a continuum: a. dead in the water: the WIT syntax never has knowledge of "host access," the tools don't support it, and this class of proposal is dead in the water (from @alexcrichton's argument that engines like Wasmtime will not want to support non-WIT-able proposals) I've implemented "b" and I believe @alexcrichton was suggesting "a." I would prefer "c." @lukewagner, @sunfishcode: I can't tell from your comments but I think you are not saying "a"? |
(Also, please prefix every sentence above with "IIUC" since I am not too familiar with the details of the component model and I'm reading into some of @alexcrichton's comments). |
To some degree I'm trying really hard here to be an impartial messenger, along the lines of "please don't shoot the messenger". I'm pretty intimately aware of all the constraints here ranging from the
I'm pointing out the systems that are in play and their various designs, and all of these systems have intentional design behind them. I am not personally going to go to the component model repository and open a "please add a small escape hatch" issue because I'm aware of why one hasn't been added yet. Furthermore I'm not going to go to the That being said I naturally have a lot of opinions about how this should all work. If a parallelism story for wasm requires an "escape hatch" that seems like a fundamental problem because parallelism is such a foundational part of any runtime environment that if it's not designed for from day one it seems to me like it has little chance of gaining any amount of larger traction. For example just because Wasmtime is capable of implementing escape hatches like this in no way means other runtimes can do so. The Personally I even have hesitation about the whole kernel model in Again though I'm trying hard to not insert myself into the design here. These are some thoughts of mine but honestly I don't consider them relevant since I'm not the one championing or being involved in the proposal. At the same time though I feel like someone has to point out the actual limitations of the systems that are being built at some point as otherwise it feels like things are carried on as everyone's just blindly merging everyone else's work without thought to the consequences.
My hope is that design issues like running Sorry for the long post, I have a lot of thoughts on this but now's probably not the time or the place. I'll reiterate that I personally would like to see threads-on-wasm via a standard like WASI. I don't think that the component model and WASI are in a state where things can be plucked off the shelf, mashed together, and a workable standard created. I feel there are deeper designs that need to be implemented and accounted for before moving forward with this iteration of |
Is there a good mechanism to track usage of various features in Wasmtime? Usually I have no knowledge of certain features being used by a customer until they contact us with some issues, or they present/publish their usage somewhere. And these are probably a subset of total user base. For example, I know a big device manufacturer is in the process of embedding wasi-nn into their products. For wasi-crypto, it is an essential function for a Wasm runtime to be selected for a SASE project we are involved. Here are a few links of wasi-nn usages from googling (not all on Wasmtime): |
Just as an update: I realized my comment above was forgetting that we're talking about the core-instance-per-thread approach to multi-threading, for which the core module import approach I suggested above would need the imported core module to itself have a module import (the module to be instantiated for each new thread), which is not yet possible in core wasm. So that might not be the right approach here. It does really seem like the best approach to exposing multi-threading to core wasm in a standard way is as proposed by Watt & Rossberg, as a core wasm feature. Although what we do to allow short-term experimentation and prototyping in Wasmtime is a different question. |
Web engines support this via a JavaScript callback which creates another copy of the instance. Web polyfill was part of the motivation for |
@mingqiusun for sure yeah, I don't mean to say definitively whether or not those proposal should be removed from the repository, and it's great to hear that @penzn indeed you are correct. There are a huge number of caveats with the support on the web for threading, however, including:
One point that's probably worth clarifying is that it's not like I think Wasmtime shouldn't be able to implement a web-style threading scenario. In fact you can, already today, implement "threading" with Wasmtime since you as the host can spawn threads and figure out how to instantiate modules on new threads and share host state. There's nothing inherent to Wasmtime preventing this with Wasmtime's core wasm embedding API. I talked with @abrown at great length yesterday about this and we're going to try to allocate time next week to discuss what it might look like for |
@abrown, @lukewagner, @sunfishcode, and I talked last week at great length about this PR and the various issues here, zooming out mostly to threads in the component model. We concluded a number of things that I hope to summarize here, and please correct me y'all if I miss anything. Our general topic was what to do with threads, the component model, and having the two work in harmony with each other. To that end we broke down a possible plan of action in to three "stages" ordered from shorest-time-to-implement to longest-time-to-impelment. Stage 1The first thing for Wasmtime and threads is to get anything working all. The goal here would effectively, through some wasm-defined API using Wasmtime's embedding API, achieving the functionality proposed by
This is all just the bare bones of getting anything working, but should hopefully provide valuable implementation feedback to Wasmtime, the component model, and the This local maximum, however, is not immediately compatible with the component model. Some of the fundamental abstractions here, such as importing a shared linear memory, do not work the same way in the component model (components can't import a linear memory). This means that this stage does not have access to host APIs defined by the component model, such as a hypothetical Stage 2This stage is intended to be sequenced after the prior stage with the goal of hooking up component-model-level host definitions into a core wasm module that wants to use threads. As a base assumption the idea here is that the user-supplied "thing" is still a core wasm module. The high-level idea is that host-defined component-model definitions are "automatically lowered" into the corresponding core wasm functions to get imported into a core wasm module. This sort of means that a The main body of work in this stage would be to figure out how to robustly take a component-model level API and translate it into a core wasm API. This is roughly already done with the canonical ABI but I personally believe there's a lot of fiddly bits to work out such as how to reuse the lifting/lowering in Wasmtime without duplicating it. For example somehow a In discussing this though we wondered but weren't sure of how important such feature would be for migrating from wasi preview1 to preview2. There will, for a long time, be binaries that import The downside of this stage is that there still isn't full integration with the component model. Everything is still working at the level of abstraction of a core wasm module and inherently continues to be incompatible with components. The main distinction from stage 1 is that component-model-level functionality, which is the presumed path that WASI is going, can still be imported into the module. Stage 3The next stage of evolution for support here would be to integrate threads into the component model proposal itself. This is a far-in-the-future vision though where there's a difference between a Given the possible staging here I concluded that my original stance of "we probably don't want to merge this" is no longer true. The staging here would explicitly continue to work at the core wasm layer and not the component layer. Furthermore for the time being there would be no path forward to implement this in the component layer but there would be a path to taking some of the major benefits of the component layer and bringing them to core wasm. I'd revise my previous stance, then, to it's ok to merge this into the main Wasmtime repository along the lines of all other WASI proposals. That's not to discount the technical review here, however. This is an example of a fundamental issue in |
Thanks @alexcrichton for the detailed write-up; I think it's a pretty accurate reflection of what we discussed and a decent high-level vision for how to move forward. I agree that The current issues to get to stage 1 that @alexcrichton describes above are many and span multiple repositories (this one, |
This change implements
wasi-parallel
in Wasmtime. It addresses only the CPU parallelism (not GPU), since this would introduce additional complexity to an already difficult review (if you're interested about the GPU progress, talk to @egalli). Each commit implements a separate piece to the puzzle. Though most of the changed lines are not the core implementation (seetests
,LICENSE
, etc.), due to the size of the PR, it may be convenient to look at this commit by commit.I see several issues highlighted by this change:
THREAD_MODEL=posix
builds? WebAssembly/wasi-libc#326. For Rust programs: Fail to compile with WebAssembly atomics and shared memory rust-lang/rust#102157. Due to all this, most tests and examples in this crate are meticulously hand-crafted WAT files — once toolchain support improves these difficult-to-maintain files should be replaced.wasi-parallel
works by creating new instances of the kernel in each thread of a thread pool — the host module exports a shared memory and the kernel imports it. Getting access to the shared memory from within awasi-parallel
invocation is difficult with Wasmtime/Wiggle: the best way I could think to resolve this is to teach Wiggle toskip
a WITX function and then manually implementadd_to_linker
for that function. This is tedious and error-prone, so I would be excited to discuss better solutions. Note that an almost identical version of this problem will arise when I try to implementwasi-threads
.Why now? The rationale behind merging this code behind the
wasi-parallel
feature flag is to enable users to try this out locally and to iterate on this in-tree (e.g., toolchain, GPU parts).