Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate GraphExecutionContext #43

Closed
abrown opened this issue Aug 9, 2023 · 1 comment · Fixed by #77
Closed

Eliminate GraphExecutionContext #43

abrown opened this issue Aug 9, 2023 · 1 comment · Fixed by #77

Comments

@abrown
Copy link
Collaborator

abrown commented Aug 9, 2023

This issue proposes a simplification to the wasi-nn API: eliminate the GraphExecutionContext state object altogether and instead simply pass all tensors to and from an inference call — compute(list<tensor>) -> result<list<tensor>, ...>. This change would make set_input and get_output unnecessary and they would also be removed.

As background, the WITX IDL is the cause of GraphExecutionContext's existence. As I understood it back when wasi-nn was originally designed, WITX forced us to pass an empty "pointer + length" buffer across to the host so that the host could fill it. This led to get_output(...), which included an index parameter for retrieving tensors from multi-output models (multiple outputs is a possibility that must be handled, though not too common). Because get_output was now separate from compute, we needed some state to track the inference request — GraphExecutionContext.

Now, with WIT, we can expect the ABI to be able the host-allocate into our WebAssembly linear memory for us. This is better in two ways:

  • with the original "pass a buffer to get_output(...), the user had to statically know how large that buffer should be; this led to thoughts about expanding the API with some introspection (feature: describe graph inputs and outputs #37); if we replace GraphExecutionContext with a single compute call, this is no longer a user-facing paper cut
  • the API would now be simpler: if compute accepts and returns all the necessary tensors, then GraphExecutionContext, set_input, and get_output can all be removed, making the API easier to explain and use

One consideration here is ML framework compatibility: some frameworks (e.g., OpenVINO) expose an equivalent to GraphExecutionContext in their external API that must be called by implementations of wasi-nn. But, because this context object can be created inside the implementation, there is no compatibility issue. Implementations of compute will simply do a bit more than they currently do, but no more overall work than they do currently.

Another consideration is memory copying overhead: will WIT force us to copy the tensor bytes across the guest-host boundary in both directions? Tensors can be large and additional copies could be expensive. For output tensors, this may be unavoidable: when the tensor is generated on the host side during inference it must be made accessible to the Wasm guest somehow — copying is a simple solution. For input tensors, though, this discussion might suggest that there is no WIT-inherent limitation to avoid the copy. If tensor copying becomes a bottleneck, perhaps WIT resources could be the solution.

@shschaefer
Copy link
Contributor

@abrown, the session/execution context interface caches the parameterization of configuration and device selection. You are going to leave this state on the graph, to be initialized during model load? And then the call to compute would accept the graph as an input instead of the GraphExecutionContext ?

The additional scenario for execution context is passing in this parameterization across multiple models (chaining) or multiple modes - natively and in the browser, we pass the GPU context between DirectX and ONNX or WebGPU and WebNN.

abrown added a commit to abrown/wasi-nn-spec that referenced this issue Aug 13, 2024
As discussed in WebAssembly#43, there is no requirement to set up tensors prior to
calling `compute` as well as retrieving them separately afterwards. As
of WebAssembly#59, passing around tensors is cheap (they're resources now), so
there is no data copy necessary if we adopt this PR. This change
proposes removing the `set-input` and `get-output` functions, moving all
of the tensor-passing to `compute`. Closes WebAssembly#43.
@abrown abrown closed this as completed in f7f1ffb Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants