Eliminate `GraphExecutionContext` #43

abrown · 2023-08-09T19:00:48Z

This issue proposes a simplification to the wasi-nn API: eliminate the GraphExecutionContext state object altogether and instead simply pass all tensors to and from an inference call — compute(list<tensor>) -> result<list<tensor>, ...>. This change would make set_input and get_output unnecessary and they would also be removed.

As background, the WITX IDL is the cause of GraphExecutionContext's existence. As I understood it back when wasi-nn was originally designed, WITX forced us to pass an empty "pointer + length" buffer across to the host so that the host could fill it. This led to get_output(...), which included an index parameter for retrieving tensors from multi-output models (multiple outputs is a possibility that must be handled, though not too common). Because get_output was now separate from compute, we needed some state to track the inference request — GraphExecutionContext.

Now, with WIT, we can expect the ABI to be able the host-allocate into our WebAssembly linear memory for us. This is better in two ways:

with the original "pass a buffer to get_output(...), the user had to statically know how large that buffer should be; this led to thoughts about expanding the API with some introspection (feature: describe graph inputs and outputs #37); if we replace GraphExecutionContext with a single compute call, this is no longer a user-facing paper cut
the API would now be simpler: if compute accepts and returns all the necessary tensors, then GraphExecutionContext, set_input, and get_output can all be removed, making the API easier to explain and use

One consideration here is ML framework compatibility: some frameworks (e.g., OpenVINO) expose an equivalent to GraphExecutionContext in their external API that must be called by implementations of wasi-nn. But, because this context object can be created inside the implementation, there is no compatibility issue. Implementations of compute will simply do a bit more than they currently do, but no more overall work than they do currently.

Another consideration is memory copying overhead: will WIT force us to copy the tensor bytes across the guest-host boundary in both directions? Tensors can be large and additional copies could be expensive. For output tensors, this may be unavoidable: when the tensor is generated on the host side during inference it must be made accessible to the Wasm guest somehow — copying is a simple solution. For input tensors, though, this discussion might suggest that there is no WIT-inherent limitation to avoid the copy. If tensor copying becomes a bottleneck, perhaps WIT resources could be the solution.

The text was updated successfully, but these errors were encountered:

shschaefer · 2023-09-25T23:36:48Z

@abrown, the session/execution context interface caches the parameterization of configuration and device selection. You are going to leave this state on the graph, to be initialized during model load? And then the call to compute would accept the graph as an input instead of the GraphExecutionContext ?

The additional scenario for execution context is passing in this parameterization across multiple models (chaining) or multiple modes - natively and in the browser, we pass the GPU context between DirectX and ONNX or WebGPU and WebNN.

As discussed in WebAssembly#43, there is no requirement to set up tensors prior to calling `compute` as well as retrieving them separately afterwards. As of WebAssembly#59, passing around tensors is cheap (they're resources now), so there is no data copy necessary if we adopt this PR. This change proposes removing the `set-input` and `get-output` functions, moving all of the tensor-passing to `compute`. Closes WebAssembly#43.

This was referenced Aug 14, 2023

Use component model resources #47

Closed

Use names instead of numbers to identify inputs and outputs #48

Closed

abrown mentioned this issue Aug 13, 2024

Remove set-input, get-output #77

Merged

abrown closed this as completed in #77 Oct 28, 2024

abrown closed this as completed in f7f1ffb Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate `GraphExecutionContext` #43

Eliminate `GraphExecutionContext` #43

abrown commented Aug 9, 2023

shschaefer commented Sep 25, 2023

Eliminate GraphExecutionContext #43

Eliminate GraphExecutionContext #43

Comments

abrown commented Aug 9, 2023

shschaefer commented Sep 25, 2023

Eliminate `GraphExecutionContext` #43

Eliminate `GraphExecutionContext` #43