Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

components: Implement the ability to call component exports #4039

Merged

Conversation

alexcrichton
Copy link
Member

This commit is an implementation of the typed method of calling
component exports. This is intended to represent the most efficient way
of calling a component in Wasmtime, similar to what TypedFunc
represents today for core wasm.

Internally this contains all the traits and implementations necessary to
invoke component exports with any type signature (e.g. arbitrary
parameters and/or results). The expectation is that for results we'll
reuse all of this infrastructure except in reverse (arguments and
results will be swapped when defining imports).

Some features of this implementation are:

  • Arbitrary type hierarchies are supported
  • The Rust-standard Option, Result, String, Vec<T>, and tuple
    types all map down to the corresponding type in the component model.
  • Basic utf-16 string support is implemented as proof-of-concept to show
    what handling might look like. This will need further testing and
    benchmarking.
  • Arguments can be behind "smart pointers", so for example
    &Rc<Arc<[u8]>> corresponds to list<u8> in interface types.
  • Bulk copies from linear memory never happen unless explicitly
    instructed to do so.

The goal of this commit is to create the ability to actually invoke wasm
components. This represents what is expected to be the performance
threshold for these calls where it ideally should be optimal how
WebAssembly is invoked. One major missing piece of this is a #[derive]
of some sort to generate Rust types for arbitrary *.wit types such as
custom records, variants, flags, unions, etc. The current trait impls
for tuples and Result<T, E> are expected to have fleshed out most of
what such a derive would look like.

There are some downsides and missing pieces to this commit and method of
calling components, however, such as:

  • Passing &[u8] to WebAssembly is currently not optimal. Ideally this
    compiles down to a memcpy-equivalent somewhere but that currently
    doesn't happen due to all the bounds checks of copying data into
    memory. I have been unsuccessful so far at getting these bounds checks
    to be removed.
  • There is no finalization at this time (the "post return" functionality
    in the canonical ABI). Implementing this should be relatively
    straightforward but at this time requires wasmparser changes to
    catch up with the current canonical ABI.
  • There is no guarantee that results of a wasm function will be
    validated. As results are consumed they are validated but this means
    that if function returns an invalid string which the host doesn't look
    at then no trap will be generated. This is probably not the intended
    semantics of hosts in the component model.
  • At this time there's no support for memory64 memories, just a bunch of
    FIXMEs to get around to. It's expected that this won't be too
    onerous, however. Some extra care will need to ensure that the various
    methods related to size/alignment all optimize to the same thing they
    do today (e.g. constants).
  • The return value of a typed component function is either T or
    Value<T>, and it depends on the ABI details of T and whether it
    takes up more than one return value slot or not. This is an
    ABI-implementation detail which is being forced through to the API
    layer which is pretty unfortunate. For example if you say the return
    value of a function is (u8, u32) then it's a runtime type-checking
    error. I don't know of a great way to solve this at this time.

Overall I'm feeling optimistic about this trajectory of implementing
value lifting/lowering in Wasmtime. While there are a number of
downsides none seem completely insurmountable. There's naturally still a
good deal of work with the component model but this should be a
significant step up towards implementing and testing the component model.

@alexcrichton alexcrichton requested a review from fitzgen April 15, 2022 19:53
@alexcrichton
Copy link
Member Author

Note that procedurally this is based on #4005 so the first few commits aren't relevant to this PR itself. Additionally as with #4005 there are no tests, and this one definitely can't land without tests.

@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime labels Apr 15, 2022
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@github-actions
Copy link

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'll want to take a look at tests once they're written.

There is no guarantee that results of a wasm function will be validated. As results are consumed they are validated but this means that if function returns an invalid string which the host doesn't look at then no trap will be generated. This is probably not the intended semantics of hosts in the component model.

What if there were two ways to go from Value to Cursor? One that did the validation up front (and ideally set a flag or something somewhere, maybe in the Value that we already validated and don't need to re-validate when getting each string in a list<string>) and an "unchecked" or "unvalidated" version that just gives you the cursor and it is your responsibility to actually validate each thing as you walk the cursor?

///
/// The `Params` type parameter here is a tuple of the parameters to the
/// function. A function which takes no arguments should use `()`, a
/// function with one argument should use `(T,)`, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have impls of plain T for TypedFunc? Can we do the same here? I guess that would introduce ambiguity between a single tuple parameter and N parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have impls for plain T, yeah. But yes as you deduced the ambiguity prevents the impl from existing for interface types.

Comment on lines +108 to +114
/// For the `Return` type parameter many types need to be wrapped in a
/// [`Value<T>`]. For example functions which return a string should use the
/// `Return` type parameter as `Value<String>` instead of a bare `String`.
/// The usage of [`Value`] indicates that a type is stored in linear memory.
//
// FIXME: Having to remember when to use `Value<T>` vs `T` is going to trip
// people up using this API. It's not clear, though, how to fix that.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe easier for folks to understand if we talk about how scalars don't use Value vs compound types do use Value? I think they are the same, right? Unless the canonical ABI packs, e.g., an interface type (tuple u32 u32) into a Wasm i64 return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this isn't 100% accurate advice in the sense that (tuple u32) is returned by-value. No fancy bit-packing happens but if the number of return values is small enough (e.g. 1) then it's returned by-value.

crates/wasmtime/src/component/func.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/component/func.rs Show resolved Hide resolved
crates/wasmtime/src/component/func/typed.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/component/func/typed.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/component/func/typed.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/component/func/typed.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/component/func/typed.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

What if there were two ways to go from Value to Cursor?

The reason I haven't gone with something like this is that the idiomatic thing to do is then the slow thing, which I've been trying to avoid. Existence of a Value<T> doesn't lock memory to a particular state or prevent mutations, so validation of a Value<T> must always be re-done.

It's also less-so validation and more just reading all the memory. Even if we were to somehow do ahead-of-time validation it means that when you read something like list<my-enum> that does all the decoding a second time which is also something we should be shooting to avoid for performant cases. Basically I think it's required that we interleave validation and decoding for performance, but to be strictly standards compliant we probably need to figure out how to force those two to happen given any Value<T>

@alexcrichton alexcrichton force-pushed the component-call-export branch from 33c80d1 to f220145 Compare April 28, 2022 18:49
This commit is an implementation of the typed method of calling
component exports. This is intended to represent the most efficient way
of calling a component in Wasmtime, similar to what `TypedFunc`
represents today for core wasm.

Internally this contains all the traits and implementations necessary to
invoke component exports with any type signature (e.g. arbitrary
parameters and/or results). The expectation is that for results we'll
reuse all of this infrastructure except in reverse (arguments and
results will be swapped when defining imports).

Some features of this implementation are:

* Arbitrary type hierarchies are supported
* The Rust-standard `Option`, `Result`, `String`, `Vec<T>`, and tuple
  types all map down to the corresponding type in the component model.
* Basic utf-16 string support is implemented as proof-of-concept to show
  what handling might look like. This will need further testing and
  benchmarking.
* Arguments can be behind "smart pointers", so for example
  `&Rc<Arc<[u8]>>` corresponds to `list<u8>` in interface types.
* Bulk copies from linear memory never happen unless explicitly
  instructed to do so.

The goal of this commit is to create the ability to actually invoke wasm
components. This represents what is expected to be the performance
threshold for these calls where it ideally should be optimal how
WebAssembly is invoked. One major missing piece of this is a `#[derive]`
of some sort to generate Rust types for arbitrary `*.wit` types such as
custom records, variants, flags, unions, etc. The current trait impls
for tuples and `Result<T, E>` are expected to have fleshed out most of
what such a derive would look like.

There are some downsides and missing pieces to this commit and method of
calling components, however, such as:

* Passing `&[u8]` to WebAssembly is currently not optimal. Ideally this
  compiles down to a `memcpy`-equivalent somewhere but that currently
  doesn't happen due to all the bounds checks of copying data into
  memory. I have been unsuccessful so far at getting these bounds checks
  to be removed.
* There is no finalization at this time (the "post return" functionality
  in the canonical ABI). Implementing this should be relatively
  straightforward but at this time requires `wasmparser` changes to
  catch up with the current canonical ABI.
* There is no guarantee that results of a wasm function will be
  validated. As results are consumed they are validated but this means
  that if function returns an invalid string which the host doesn't look
  at then no trap will be generated. This is probably not the intended
  semantics of hosts in the component model.
* At this time there's no support for memory64 memories, just a bunch of
  `FIXME`s to get around to. It's expected that this won't be too
  onerous, however. Some extra care will need to ensure that the various
  methods related to size/alignment all optimize to the same thing they
  do today (e.g. constants).
* The return value of a typed component function is either `T` or
  `Value<T>`, and it depends on the ABI details of `T` and whether it
  takes up more than one return value slot or not. This is an
  ABI-implementation detail which is being forced through to the API
  layer which is pretty unfortunate. For example if you say the return
  value of a function is `(u8, u32)` then it's a runtime type-checking
  error. I don't know of a great way to solve this at this time.

Overall I'm feeling optimistic about this trajectory of implementing
value lifting/lowering in Wasmtime. While there are a number of
downsides none seem completely insurmountable. There's naturally still a
good deal of work with the component model but this should be a
significant step up towards implementing and testing the component model.
@alexcrichton alexcrichton force-pushed the component-call-export branch from f220145 to 3764ebd Compare May 20, 2022 20:55
@alexcrichton
Copy link
Member Author

Still needs tests, so nothing new here, but I would like to confirm with other the subtyping store because that may or may not invalidate everything in this PR.

This commit adds a new test file for actually executing functions and
testing their results. This is not written as a `*.wast` test yet since
it's not 100% clear if that's the best way to do that for now (given
that dynamic signatures aren't supported yet). The tests themselves
could all largely be translated to `*.wast` testing in the future,
though, if supported.

Along the way a number of minor issues were fixed with lowerings with
the bugs exposed here.
@alexcrichton alexcrichton marked this pull request as ready for review May 24, 2022 18:08
@alexcrichton
Copy link
Member Author

Ok I have written a mess of tests for this which indeed found some issues in the previous code. I've also opened #4185 to track various items to make sure I don't forget them. Otherwise I think this is probably good to land now.

Currently I have not added support to *.wast for actually invoking component functions. I could do that (inventing some syntax for this along the way) along with statically listing "these are the signatures which can be called" or something like that. I opted to instead use custom embedding tests, but eventually I think it will be more useful to get *.wast support as well (but it's always limited in the sense that the callable signatures must be statically listed internally). If @fitzgen you or others have thoughts on how to improve the tests here I'm happy to change things up as well.

@alexcrichton alexcrichton requested a review from fitzgen May 24, 2022 18:14
@alexcrichton
Copy link
Member Author

Ah let me write my thoughts on subtyping here as well.

I realized relatively late in the design process for this feature that we did not factor in subtyping when crossing between the host and wasm. The major problem here is that the API designed here is where the host statically asserts the signature of a wasm componet function and then attempts to call it with that signature. Given subtyping, though, one might suspect that subtyping relationships would be respected in this typecheck. Implementation-wise, this is not supported.

For example if a wasm function export takes zero parameters, then the host could declare that it in fact takes 2 parameters. According to the subtyping rules these type signatures are compatible and this function call should be executed. If the parameters were strings though the host would copy in results to the module when calling it when it shouldn't do that.

A naive fix for this issue would be to always lower values in the context of a type. For example the ComponentValue trait would have a type added to the lower method where a value is lowered into a particular type (or stored as one). This means though that all layers of lowering are constantly performing typechecks to see if subtyping conversions are necessary. It's roughly predicted that the cost of this will be prohibitive and as such we wouldn't want to do that.

A much more advanced fix would be to generate trampolines. Compilation of a component would now also involve specifying the interface that an embedder would be using (e.g. the signature it expects for exports and the signatures for the imports). Appropriate trampolines would be generated and the host would call the trampolines which would "do the right thing". The problem with this strategy is that the host loses all flexibility of the layout of host data. Instead now everything has to be in a format that the trampoline understands. Additionally throwing in complexity of things like host destructors makes this much more complex as well.

Overall after talking with Luke the current thinking is to do this:

  • Do not implement subtyping in host imports/exports. Everything has to have an exact signature match.
  • If subtyping is necessary, the component is wrapped in another component which uses the right signatures.

The (hopeful) idea is that hosts can detect mismatches in type relationships. Hosts ideally have *.wit files or similar describing imports and exports, and that can be used to preprocess a component from a consumer to either verify that the types all line up or whether a second component is needed to perform the subtyping relationship. This component-to-component communication would then handle subtyping via the Cranelift-generated trampolines because type representation is all fixed with the canonical ABI and it's much easier to generate a trampoline.

@fitzgen
Copy link
Member

fitzgen commented May 24, 2022

Overall after talking with Luke the current thinking is to do this:

* Do not implement subtyping in host imports/exports. Everything has to have an exact signature match.

* If subtyping is necessary, the component is wrapped in another component which uses the right signatures.

The (hopeful) idea is that hosts can detect mismatches in type relationships. Hosts ideally have *.wit files or similar describing imports and exports, and that can be used to preprocess a component from a consumer to either verify that the types all line up or whether a second component is needed to perform the subtyping relationship. This component-to-component communication would then handle subtyping via the Cranelift-generated trampolines because type representation is all fixed with the canonical ABI and it's much easier to generate a trampoline.

This seems like a very reasonable path for us to go down.

const CANON_32BIT_NAN: u32 = 0b01111111110000000000000000000000;
const CANON_64BIT_NAN: u64 = 0b0111111111111000000000000000000000000000000000000000000000000000;

// A simple bump allocator which can be used with modules belowt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A simple bump allocator which can be used with modules belowt
// A simple bump allocator which can be used with modules below.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 24, 2022
Force accessing to go through constructors and accessors to localize the
knowledge about little-endian-ness. This is spawned since I made a
mistake in bytecodealliance#4039 about endianness.
@alexcrichton alexcrichton merged commit 140b835 into bytecodealliance:main May 24, 2022
@alexcrichton alexcrichton deleted the component-call-export branch May 24, 2022 22:02
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 24, 2022
Force accessing to go through constructors and accessors to localize the
knowledge about little-endian-ness. This is spawned since I made a
mistake in bytecodealliance#4039 about endianness.
alexcrichton added a commit that referenced this pull request May 25, 2022
* Make `ValRaw` fields private

Force accessing to go through constructors and accessors to localize the
knowledge about little-endian-ness. This is spawned since I made a
mistake in #4039 about endianness.

* Fix some tests

* Component model changes
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 28, 2022
Prior to this PR a major feature of calling component exports (bytecodealliance#4039)
was the usage of the `Value<T>` type. This type represents a value
stored in wasm linear memory (the type `T` stored there). This
implementation had a number of drawbacks though:

* When returning a value it's ABI-specific whether you use `T` or
  `Value<T>` as a return value. If `T` is represented with one wasm
  primitive then you have to return `T`, otherwise the return value must
  be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into
  the API which is unfortunate.

* The `T` in `Value<T>` was somewhat non-obvious. For example a
  wasm-owned string was `Value<String>`. Using `Value<&str>` didn't
  work.

* Working with `Value<T>` was unergonomic in the sense that you had to
  first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you
  could start reading the value.

* Custom structs and enums, while not implemented yet, were planned to
  be quite wonky where when you had `Cursor<MyStruct>` then you would
  have to import a `CursorMyStructExt` trait generated by a proc-macro
  (think a `#[derive]` on the definition of `MyStruct`) which would
  enable field accessors, returning cursors of all the fields.

* In general there was no "generic way" to load a `T` from memory. Other
  operations like lift/lower/store all had methods in the
  `ComponentValue` trait but load had no equivalent.

None of these drawbacks were deal-breakers per-se. When I started
to implement imported functions, though, the `Value<T>` type no longer
worked. The major difference between imports and exports is that when
receiving values from wasm an export returns at most one wasm primitive
where an import can yield (through arguments) up to 16 wasm primitives.
This means that if an export returned a string it would always be
`Value<String>` but if an import took a string as an argument there was
actually no way to represent this with `Value<String>` since the value
wasn't actually stored in memory but rather the pointer/length pair is
received as arguments. Overall this meant that `Value<T>` couldn't be
used for arguments-to-imports, which means that altogether something new
would be required.

This PR completely removes the `Value<T>` and `Cursor<T>` type in favor
of a different implementation. The inspiration from this comes from the
fact that all primitives can be both lifted and lowered into wasm while
it's just some times which can only go one direction. For example
`String` can be lowered into wasm but can't be lifted from wasm. Instead
some sort of "view" into wasm needs to be created during lifting.

One of the realizations from bytecodealliance#4039 was that we could leverage
run-time-type-checking to reject static constructions that don't make
sense. For example if an embedder asserts that a wasm function returns a
Rust `String` we can reject that at typechecking time because it's
impossible for a wasm module to ever do that.

The new system of imports/exports in this PR now looks like:

* Type-checking takes into accont an `Op` operation which indicates
  whether we'll be lifting or lowering the type. This means that we can
  allow the lowering operation for `String` but disallow the lifting
  operation. While we can't statically rule out an embedder saying that
  a component returns a `String` we can now reject it at runtime and
  disallow it from being called.

* The `ComponentValue` trait now sports a new `load` function. This
  function will load and instance of `Self` from the byte-array
  provided. This is implemented for all types but only ever actually
  executed when the `lift` operation is allowed during type-checking.

* The `Lift` associated type is removed since it's now expected that the
  lift operation returns `Self`.

* The `ComponentReturn` trait is now no longer necessary and is removed.
  Instead returns are bounded by `ComponentValue`. During type-checking
  it's required that the return value can be lifted, disallowing, for
  example, returning a `String` or `&str`.

* With `Value` gone there's no need to specify the ABI details of the
  return value, or whether it's communicated through memory or not. This
  means that handling return values through memory is transparently
  handled by Wasmtime.

* Validation is in a sense more eagerly performed now. Whenever a value
  `T` is loaded the entire immediate structure of `T` is loaded and
  validated. Note that recursive through memory validation still does
  not happen, so the contents of lists or strings aren't validated, it's
  just validated that the pointers are in-bounds.

Overall this felt like a much clearer system to work with and should be
much easier to integrate with imported functions as well. The new
`WasmStr` and `WasmList<T>` types can be used in import arguments and
lifted from the immediate arguments provided rather than forcing them to
always be stored in memory.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 29, 2022
Prior to this PR a major feature of calling component exports (bytecodealliance#4039)
was the usage of the `Value<T>` type. This type represents a value
stored in wasm linear memory (the type `T` stored there). This
implementation had a number of drawbacks though:

* When returning a value it's ABI-specific whether you use `T` or
  `Value<T>` as a return value. If `T` is represented with one wasm
  primitive then you have to return `T`, otherwise the return value must
  be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into
  the API which is unfortunate.

* The `T` in `Value<T>` was somewhat non-obvious. For example a
  wasm-owned string was `Value<String>`. Using `Value<&str>` didn't
  work.

* Working with `Value<T>` was unergonomic in the sense that you had to
  first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you
  could start reading the value.

* Custom structs and enums, while not implemented yet, were planned to
  be quite wonky where when you had `Cursor<MyStruct>` then you would
  have to import a `CursorMyStructExt` trait generated by a proc-macro
  (think a `#[derive]` on the definition of `MyStruct`) which would
  enable field accessors, returning cursors of all the fields.

* In general there was no "generic way" to load a `T` from memory. Other
  operations like lift/lower/store all had methods in the
  `ComponentValue` trait but load had no equivalent.

None of these drawbacks were deal-breakers per-se. When I started
to implement imported functions, though, the `Value<T>` type no longer
worked. The major difference between imports and exports is that when
receiving values from wasm an export returns at most one wasm primitive
where an import can yield (through arguments) up to 16 wasm primitives.
This means that if an export returned a string it would always be
`Value<String>` but if an import took a string as an argument there was
actually no way to represent this with `Value<String>` since the value
wasn't actually stored in memory but rather the pointer/length pair is
received as arguments. Overall this meant that `Value<T>` couldn't be
used for arguments-to-imports, which means that altogether something new
would be required.

This PR completely removes the `Value<T>` and `Cursor<T>` type in favor
of a different implementation. The inspiration from this comes from the
fact that all primitives can be both lifted and lowered into wasm while
it's just some times which can only go one direction. For example
`String` can be lowered into wasm but can't be lifted from wasm. Instead
some sort of "view" into wasm needs to be created during lifting.

One of the realizations from bytecodealliance#4039 was that we could leverage
run-time-type-checking to reject static constructions that don't make
sense. For example if an embedder asserts that a wasm function returns a
Rust `String` we can reject that at typechecking time because it's
impossible for a wasm module to ever do that.

The new system of imports/exports in this PR now looks like:

* Type-checking takes into accont an `Op` operation which indicates
  whether we'll be lifting or lowering the type. This means that we can
  allow the lowering operation for `String` but disallow the lifting
  operation. While we can't statically rule out an embedder saying that
  a component returns a `String` we can now reject it at runtime and
  disallow it from being called.

* The `ComponentValue` trait now sports a new `load` function. This
  function will load and instance of `Self` from the byte-array
  provided. This is implemented for all types but only ever actually
  executed when the `lift` operation is allowed during type-checking.

* The `Lift` associated type is removed since it's now expected that the
  lift operation returns `Self`.

* The `ComponentReturn` trait is now no longer necessary and is removed.
  Instead returns are bounded by `ComponentValue`. During type-checking
  it's required that the return value can be lifted, disallowing, for
  example, returning a `String` or `&str`.

* With `Value` gone there's no need to specify the ABI details of the
  return value, or whether it's communicated through memory or not. This
  means that handling return values through memory is transparently
  handled by Wasmtime.

* Validation is in a sense more eagerly performed now. Whenever a value
  `T` is loaded the entire immediate structure of `T` is loaded and
  validated. Note that recursive through memory validation still does
  not happen, so the contents of lists or strings aren't validated, it's
  just validated that the pointers are in-bounds.

Overall this felt like a much clearer system to work with and should be
much easier to integrate with imported functions as well. The new
`WasmStr` and `WasmList<T>` types can be used in import arguments and
lifted from the immediate arguments provided rather than forcing them to
always be stored in memory.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jun 1, 2022
Prior to this PR a major feature of calling component exports (bytecodealliance#4039)
was the usage of the `Value<T>` type. This type represents a value
stored in wasm linear memory (the type `T` stored there). This
implementation had a number of drawbacks though:

* When returning a value it's ABI-specific whether you use `T` or
  `Value<T>` as a return value. If `T` is represented with one wasm
  primitive then you have to return `T`, otherwise the return value must
  be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into
  the API which is unfortunate.

* The `T` in `Value<T>` was somewhat non-obvious. For example a
  wasm-owned string was `Value<String>`. Using `Value<&str>` didn't
  work.

* Working with `Value<T>` was unergonomic in the sense that you had to
  first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you
  could start reading the value.

* Custom structs and enums, while not implemented yet, were planned to
  be quite wonky where when you had `Cursor<MyStruct>` then you would
  have to import a `CursorMyStructExt` trait generated by a proc-macro
  (think a `#[derive]` on the definition of `MyStruct`) which would
  enable field accessors, returning cursors of all the fields.

* In general there was no "generic way" to load a `T` from memory. Other
  operations like lift/lower/store all had methods in the
  `ComponentValue` trait but load had no equivalent.

None of these drawbacks were deal-breakers per-se. When I started
to implement imported functions, though, the `Value<T>` type no longer
worked. The major difference between imports and exports is that when
receiving values from wasm an export returns at most one wasm primitive
where an import can yield (through arguments) up to 16 wasm primitives.
This means that if an export returned a string it would always be
`Value<String>` but if an import took a string as an argument there was
actually no way to represent this with `Value<String>` since the value
wasn't actually stored in memory but rather the pointer/length pair is
received as arguments. Overall this meant that `Value<T>` couldn't be
used for arguments-to-imports, which means that altogether something new
would be required.

This PR completely removes the `Value<T>` and `Cursor<T>` type in favor
of a different implementation. The inspiration from this comes from the
fact that all primitives can be both lifted and lowered into wasm while
it's just some times which can only go one direction. For example
`String` can be lowered into wasm but can't be lifted from wasm. Instead
some sort of "view" into wasm needs to be created during lifting.

One of the realizations from bytecodealliance#4039 was that we could leverage
run-time-type-checking to reject static constructions that don't make
sense. For example if an embedder asserts that a wasm function returns a
Rust `String` we can reject that at typechecking time because it's
impossible for a wasm module to ever do that.

The new system of imports/exports in this PR now looks like:

* Type-checking takes into accont an `Op` operation which indicates
  whether we'll be lifting or lowering the type. This means that we can
  allow the lowering operation for `String` but disallow the lifting
  operation. While we can't statically rule out an embedder saying that
  a component returns a `String` we can now reject it at runtime and
  disallow it from being called.

* The `ComponentValue` trait now sports a new `load` function. This
  function will load and instance of `Self` from the byte-array
  provided. This is implemented for all types but only ever actually
  executed when the `lift` operation is allowed during type-checking.

* The `Lift` associated type is removed since it's now expected that the
  lift operation returns `Self`.

* The `ComponentReturn` trait is now no longer necessary and is removed.
  Instead returns are bounded by `ComponentValue`. During type-checking
  it's required that the return value can be lifted, disallowing, for
  example, returning a `String` or `&str`.

* With `Value` gone there's no need to specify the ABI details of the
  return value, or whether it's communicated through memory or not. This
  means that handling return values through memory is transparently
  handled by Wasmtime.

* Validation is in a sense more eagerly performed now. Whenever a value
  `T` is loaded the entire immediate structure of `T` is loaded and
  validated. Note that recursive through memory validation still does
  not happen, so the contents of lists or strings aren't validated, it's
  just validated that the pointers are in-bounds.

Overall this felt like a much clearer system to work with and should be
much easier to integrate with imported functions as well. The new
`WasmStr` and `WasmList<T>` types can be used in import arguments and
lifted from the immediate arguments provided rather than forcing them to
always be stored in memory.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jun 1, 2022
Prior to this PR a major feature of calling component exports (bytecodealliance#4039)
was the usage of the `Value<T>` type. This type represents a value
stored in wasm linear memory (the type `T` stored there). This
implementation had a number of drawbacks though:

* When returning a value it's ABI-specific whether you use `T` or
  `Value<T>` as a return value. If `T` is represented with one wasm
  primitive then you have to return `T`, otherwise the return value must
  be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into
  the API which is unfortunate.

* The `T` in `Value<T>` was somewhat non-obvious. For example a
  wasm-owned string was `Value<String>`. Using `Value<&str>` didn't
  work.

* Working with `Value<T>` was unergonomic in the sense that you had to
  first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you
  could start reading the value.

* Custom structs and enums, while not implemented yet, were planned to
  be quite wonky where when you had `Cursor<MyStruct>` then you would
  have to import a `CursorMyStructExt` trait generated by a proc-macro
  (think a `#[derive]` on the definition of `MyStruct`) which would
  enable field accessors, returning cursors of all the fields.

* In general there was no "generic way" to load a `T` from memory. Other
  operations like lift/lower/store all had methods in the
  `ComponentValue` trait but load had no equivalent.

None of these drawbacks were deal-breakers per-se. When I started
to implement imported functions, though, the `Value<T>` type no longer
worked. The major difference between imports and exports is that when
receiving values from wasm an export returns at most one wasm primitive
where an import can yield (through arguments) up to 16 wasm primitives.
This means that if an export returned a string it would always be
`Value<String>` but if an import took a string as an argument there was
actually no way to represent this with `Value<String>` since the value
wasn't actually stored in memory but rather the pointer/length pair is
received as arguments. Overall this meant that `Value<T>` couldn't be
used for arguments-to-imports, which means that altogether something new
would be required.

This PR completely removes the `Value<T>` and `Cursor<T>` type in favor
of a different implementation. The inspiration from this comes from the
fact that all primitives can be both lifted and lowered into wasm while
it's just some times which can only go one direction. For example
`String` can be lowered into wasm but can't be lifted from wasm. Instead
some sort of "view" into wasm needs to be created during lifting.

One of the realizations from bytecodealliance#4039 was that we could leverage
run-time-type-checking to reject static constructions that don't make
sense. For example if an embedder asserts that a wasm function returns a
Rust `String` we can reject that at typechecking time because it's
impossible for a wasm module to ever do that.

The new system of imports/exports in this PR now looks like:

* Type-checking takes into accont an `Op` operation which indicates
  whether we'll be lifting or lowering the type. This means that we can
  allow the lowering operation for `String` but disallow the lifting
  operation. While we can't statically rule out an embedder saying that
  a component returns a `String` we can now reject it at runtime and
  disallow it from being called.

* The `ComponentValue` trait now sports a new `load` function. This
  function will load and instance of `Self` from the byte-array
  provided. This is implemented for all types but only ever actually
  executed when the `lift` operation is allowed during type-checking.

* The `Lift` associated type is removed since it's now expected that the
  lift operation returns `Self`.

* The `ComponentReturn` trait is now no longer necessary and is removed.
  Instead returns are bounded by `ComponentValue`. During type-checking
  it's required that the return value can be lifted, disallowing, for
  example, returning a `String` or `&str`.

* With `Value` gone there's no need to specify the ABI details of the
  return value, or whether it's communicated through memory or not. This
  means that handling return values through memory is transparently
  handled by Wasmtime.

* Validation is in a sense more eagerly performed now. Whenever a value
  `T` is loaded the entire immediate structure of `T` is loaded and
  validated. Note that recursive through memory validation still does
  not happen, so the contents of lists or strings aren't validated, it's
  just validated that the pointers are in-bounds.

Overall this felt like a much clearer system to work with and should be
much easier to integrate with imported functions as well. The new
`WasmStr` and `WasmList<T>` types can be used in import arguments and
lifted from the immediate arguments provided rather than forcing them to
always be stored in memory.
alexcrichton added a commit that referenced this pull request Jun 1, 2022
Prior to this PR a major feature of calling component exports (#4039)
was the usage of the `Value<T>` type. This type represents a value
stored in wasm linear memory (the type `T` stored there). This
implementation had a number of drawbacks though:

* When returning a value it's ABI-specific whether you use `T` or
  `Value<T>` as a return value. If `T` is represented with one wasm
  primitive then you have to return `T`, otherwise the return value must
  be `Value<T>`. This is somewhat non-obvious and leaks ABI-details into
  the API which is unfortunate.

* The `T` in `Value<T>` was somewhat non-obvious. For example a
  wasm-owned string was `Value<String>`. Using `Value<&str>` didn't
  work.

* Working with `Value<T>` was unergonomic in the sense that you had to
  first "pair" it with a `&Store<U>` to get a `Cursor<T>` and then you
  could start reading the value.

* Custom structs and enums, while not implemented yet, were planned to
  be quite wonky where when you had `Cursor<MyStruct>` then you would
  have to import a `CursorMyStructExt` trait generated by a proc-macro
  (think a `#[derive]` on the definition of `MyStruct`) which would
  enable field accessors, returning cursors of all the fields.

* In general there was no "generic way" to load a `T` from memory. Other
  operations like lift/lower/store all had methods in the
  `ComponentValue` trait but load had no equivalent.

None of these drawbacks were deal-breakers per-se. When I started
to implement imported functions, though, the `Value<T>` type no longer
worked. The major difference between imports and exports is that when
receiving values from wasm an export returns at most one wasm primitive
where an import can yield (through arguments) up to 16 wasm primitives.
This means that if an export returned a string it would always be
`Value<String>` but if an import took a string as an argument there was
actually no way to represent this with `Value<String>` since the value
wasn't actually stored in memory but rather the pointer/length pair is
received as arguments. Overall this meant that `Value<T>` couldn't be
used for arguments-to-imports, which means that altogether something new
would be required.

This PR completely removes the `Value<T>` and `Cursor<T>` type in favor
of a different implementation. The inspiration from this comes from the
fact that all primitives can be both lifted and lowered into wasm while
it's just some times which can only go one direction. For example
`String` can be lowered into wasm but can't be lifted from wasm. Instead
some sort of "view" into wasm needs to be created during lifting.

One of the realizations from #4039 was that we could leverage
run-time-type-checking to reject static constructions that don't make
sense. For example if an embedder asserts that a wasm function returns a
Rust `String` we can reject that at typechecking time because it's
impossible for a wasm module to ever do that.

The new system of imports/exports in this PR now looks like:

* Type-checking takes into accont an `Op` operation which indicates
  whether we'll be lifting or lowering the type. This means that we can
  allow the lowering operation for `String` but disallow the lifting
  operation. While we can't statically rule out an embedder saying that
  a component returns a `String` we can now reject it at runtime and
  disallow it from being called.

* The `ComponentValue` trait now sports a new `load` function. This
  function will load and instance of `Self` from the byte-array
  provided. This is implemented for all types but only ever actually
  executed when the `lift` operation is allowed during type-checking.

* The `Lift` associated type is removed since it's now expected that the
  lift operation returns `Self`.

* The `ComponentReturn` trait is now no longer necessary and is removed.
  Instead returns are bounded by `ComponentValue`. During type-checking
  it's required that the return value can be lifted, disallowing, for
  example, returning a `String` or `&str`.

* With `Value` gone there's no need to specify the ABI details of the
  return value, or whether it's communicated through memory or not. This
  means that handling return values through memory is transparently
  handled by Wasmtime.

* Validation is in a sense more eagerly performed now. Whenever a value
  `T` is loaded the entire immediate structure of `T` is loaded and
  validated. Note that recursive through memory validation still does
  not happen, so the contents of lists or strings aren't validated, it's
  just validated that the pointers are in-bounds.

Overall this felt like a much clearer system to work with and should be
much easier to integrate with imported functions as well. The new
`WasmStr` and `WasmList<T>` types can be used in import arguments and
lifted from the immediate arguments provided rather than forcing them to
always be stored in memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants