Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace load with load-by-name #74

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abrown
Copy link
Collaborator

@abrown abrown commented Jul 20, 2024

This change removes the load function, which generated a graph using some opaque bytes, a graph-encoding enum, and an execution-target. This mechanism allowed WebAssembly guest code (i.e., running within the WebAssembly sandbox) to control when a model is loaded, but by doing so, exposed details that users will likely not need. In FaaS use cases, e.g., user code simply does not have the time to retrieve and load a model for every HTTP request.

This PR proposes instead that users always load models outside the sandbox and then load them by a host-specified name. This is a proposal intended for discussion, not a foregone conclusion, so please provide feedback! If you have a use case that relies directly on users being able to load models via buffers, that would undermine the assumptions of this PR (that no one will use wasi-nn in this way).

But consider the downsides of the current approach: wasi-nn must keep track of an ever growing list of graph encodings and users must somehow "see through" wasi-nn to set up the model buffers. Switching to load-by-name--now called load--would resolve these issues, moving any model and framework details into the host configuration, where they already exist anyways.

abrown added 2 commits July 19, 2024 21:27
This change removes the `load` function, which generated a `graph` using
some opaque bytes, a `graph-encoding` enum, and an `execution-target`.
This mechanism allowed WebAssembly guest code (i.e., running within the
WebAssembly sandbox) to control _when_ a model is loaded, but by doing
so, exposed details that users will likely not need. In FaaS use cases,
e.g., user code simply does not have the time to retrieve and load a
model for every HTTP request.

This PR proposes instead that users _always_ load models outside the
sandbox and then load them by a host-specified name. This is a proposal
intended for discussion, not a foregone conclusion, so please provide
feedback! If you have a use case that relies directly on users being
able to load models via buffers, that would undermine the assumptions of
this PR (that no one will use wasi-nn in this way).

But consider the downsides of the current approach: wasi-nn must keep
track of an ever growing list of graph encodings and users must somehow
"see through" wasi-nn to set up the model buffers. Switching to
`load-by-name`--now called `load`--would resolve these issues, moving
any model and framework details into the host configuration, where they
already exist anyways.
@hydai
Copy link

hydai commented Jul 23, 2024

Greetings from the WasmEdge team. We prefer to replace the current load with load-by-name. The rationale is that transferring the entire model into Wasm Memory before passing it to the load function is redundant and restricts the model size to a maximum of 4GB. Therefore, we actually use load-by-name as the default method for loading most LLM models under such constraints.

@devigned
Copy link
Contributor

I agree with the replacement of load w/ load-by-name. As we discussed on the most recent office hours, I believe it would be useful to demo a implementation of using load-by-name that included a little more opinion about the string being passed to the function. Using a name for a model that is known by the backend is useful, but I believe there would be more value in passing a URI string to the backend. For example:

By using a URI string, I believe the backend could be smart enough to fetch / cache models rather than relying on a shared moniker between the guest and the host.

@abrown
Copy link
Collaborator Author

abrown commented Jul 24, 2024

cc: @shschaefer, @radu-matei, @geekbeast

abrown added a commit to abrown/wasmtime that referenced this pull request Aug 8, 2024
This change alters the `wasi-nn` world to split out two different modes
of operation:
- `inference`: this continues the traditional mechanism for computing
  with wasi-nn, by passing named `tensor`s to a `context`. Now that
  `tensor`s are resources, we pass all inputs and return all outputs
  together, eliminating `get-input` and `set-output`
- `prompt`: this new mode expects a `string` prompt which is passed
  along to a backend LLM. The returned string is not streamed, but could
  be in the future

This change also adds metadata modification of the `graph` via
`list-properties`, `get-property` and `set-property`. It is unclear
whether these methods should hang off the `context` objects instead
(TODO). It is also unclear whether the model of `load`-ing a `graph` and
then initializing it into one of the two modes via `inference::init` or
`prompt::init` is the best approach; most graphs are one or the other so
it does not make sense to open the door to `init` failures.

[bytecodealliance#74] (replace `load` with `load-by-name`) is replicated in this commit.
[bytecodealliance#75] (return errors as records) and [bytecodealliance#76] (remove the error
constructor) is superseded by this commit, since every error is simply
returned as a `string` and the `error` resource is removed.

[bytecodealliance#74]: WebAssembly/wasi-nn#74
[bytecodealliance#75]: WebAssembly/wasi-nn#75
[bytecodealliance#76]: WebAssembly/wasi-nn#76
@geekbeast
Copy link
Contributor

I don't have any objections, but I agree with Stuart that we should resolve conformance rules before making a decision here.

I think URIs could be a potential future enhancement, but I think in the short term it is up to the host if it does something smart with the name. WASI-NN shouldn't care about the URIs, but any host is free to implement some URI-handling mechanism that correctly routes loading the model underneath the covers.

@shschaefer
Copy link
Contributor

I agree with the addition of a method for loading a graph by name. This is how my team's current implementation works. We use URNs as they provide the means to abstract the location, form of download, caching, etc... away from the WASI-NN definition. URIs are more general, but often imply specific semantics about location, protocol, etc...

There is benefit to having load as a lightweight alternative which can inline within the browser handle model download or graph construction. If the API is to be general purpose, we need more than the optimizations for FaaS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants