Store the `ValRaw` type in little-endian format #4035

alexcrichton · 2022-04-14T15:35:09Z

This commit changes the internal representation of the ValRaw type to
an unconditionally little-endian format instead of its current
native-endian format. The documentation and various accessors here have
been updated as well as the associated trampolines that read ValRaw
to always work with little-endian values, converting to the host
endianness as necessary.

The motivation for this change originally comes from the implementation
of the component model that I'm working on. One aspect of the component
model's canonical ABI is how variants are passed to functions as
immediate arguments. For example for a component model function:

foo: function(x: expected<i32, f64>)

This translates to a core wasm function:

(module
  (func (export "foo") (param i32 i64)
    ;; ...
  )
)

The first i32 parameter to the core wasm function is the discriminant
of whether the result is an "ok" or an "err". The second i64, however,
is the "join" operation on the i32 and f64 payloads. Essentially
these two types are unioned into one type to get passed into the function.

Currently in the implementation of the component model my plan is to
construct a *mut [ValRaw] to pass through to WebAssembly, always
invoking component exports through host trampolines. This means that the
implementation for Result<T, E> needs to do the correct "join"
operation here when encoding a particular case into the corresponding
ValRaw.

I personally found this particularly tricky to do structurally. The
solution that I settled on with fitzgen was that if ValRaw was always
stored in a little endian format then we could employ a trick where when
encoding a variant we first set all the ValRaw slots to zero, then the
associated case we have is encoding. Afterwards the ValRaw values are
already encoded into the correct format as if they'd been "join"ed.

For example if we were to encode Ok(1i32) then this would produce
ValRaw { i32: 1 }, which memory-wise is equivalent to ValRaw { i64: 1 }
if the other bytes in the ValRaw are guaranteed to be zero. Similarly
storing ValRaw { f64 } is equivalent to the storage required for
ValRaw { i64 } here in the join operation.

Note, though, that this equivalence relies on everything being
little-endian. Otherwise the in-memory representations of ValRaw { i32: 1 }
and ValRaw { i64: 1 } are different.

That motivation is what leads to this change. It's expected that this is
a low-to-zero cost change in the sense that little-endian platforms will
see no change and big-endian platforms are already required to
efficiently byte-swap loads/stores as WebAssembly requires that.
Additionally the ValRaw type is an esoteric niche use case primarily
used for accelerating the C API right now, so it's expected that not
many users will have to update for this change.

This commit changes the internal representation of the `ValRaw` type to an unconditionally little-endian format instead of its current native-endian format. The documentation and various accessors here have been updated as well as the associated trampolines that read `ValRaw` to always work with little-endian values, converting to the host endianness as necessary. The motivation for this change originally comes from the implementation of the component model that I'm working on. One aspect of the component model's canonical ABI is how variants are passed to functions as immediate arguments. For example for a component model function: ``` foo: function(x: expected<i32, f64>) ``` This translates to a core wasm function: ```wasm (module (func (export "foo") (param i32 i64) ;; ... ) ) ``` The first `i32` parameter to the core wasm function is the discriminant of whether the result is an "ok" or an "err". The second `i64`, however, is the "join" operation on the `i32` and `f64` payloads. Essentially these two types are unioned into one type to get passed into the function. Currently in the implementation of the component model my plan is to construct a `*mut [ValRaw]` to pass through to WebAssembly, always invoking component exports through host trampolines. This means that the implementation for `Result<T, E>` needs to do the correct "join" operation here when encoding a particular case into the corresponding `ValRaw`. I personally found this particularly tricky to do structurally. The solution that I settled on with fitzgen was that if `ValRaw` was always stored in a little endian format then we could employ a trick where when encoding a variant we first set all the `ValRaw` slots to zero, then the associated case we have is encoding. Afterwards the `ValRaw` values are already encoded into the correct format as if they'd been "join"ed. For example if we were to encode `Ok(1i32)` then this would produce `ValRaw { i32: 1 }`, which memory-wise is equivalent to `ValRaw { i64: 1 }` if the other bytes in the `ValRaw` are guaranteed to be zero. Similarly storing `ValRaw { f64 }` is equivalent to the storage required for `ValRaw { i64 }` here in the join operation. Note, though, that this equivalence relies on everything being little-endian. Otherwise the in-memory representations of `ValRaw { i32: 1 }` and `ValRaw { i64: 1 }` are different. That motivation is what leads to this change. It's expected that this is a low-to-zero cost change in the sense that little-endian platforms will see no change and big-endian platforms are already required to efficiently byte-swap loads/stores as WebAssembly requires that. Additionally the `ValRaw` type is an esoteric niche use case primarily used for accelerating the C API right now, so it's expected that not many users will have to update for this change.

github-actions · 2022-04-14T15:47:41Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api", "wasmtime:c-api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api, wasmtime:c-api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

bjorn3 · 2022-04-14T17:15:16Z

crates/c-api/include/wasmtime/val.h

@@ -119,24 +119,38 @@ typedef uint8_t wasmtime_v128[16];
 */
 typedef union wasmtime_valunion {
  /// Field used if #wasmtime_val_t::kind is #WASMTIME_I32
+  ///
+  /// Note that this field is always stored in a little-endian format.


Maybe store char[4], char[8], ... instead? Otherwise people will likely forget to do the endianness transform. Especially as all common architectures are little endian too.

I would prefer to leave this in to avoid the need for funky casts and such in C. This is not intended to be super heavily used either and other language bindings will have to deal with this anyway.

You need funky casts either way to support big endian systems.

bjorn3 · 2022-04-14T17:15:55Z

crates/runtime/src/vmcontext.rs

@@ -778,16 +778,81 @@ impl VMContext {
 /// This is provided for use with the `Func::new_unchecked` and
 /// `Func::call_unchecked` APIs. In general it's unlikely you should be using
 /// this from Rust, rather using APIs like `Func::wrap` and `TypedFunc::call`.
+///
+/// This is notably an "unsafe" way to work with `Val` and it's recommended to


The comment here is actually intended to point to Val in that ValRaw is the unsafe way of working with what is otherwise a wasm Val value.

…asmtime_val_raw`. It seems they were mistakenly added to the `wasmtime_valunion` union whereas it is actually the `ValRaw` Rust type (represented by `wasmtime_val_raw`) that is affected by the change.

#5303) It seems they were mistakenly added to the `wasmtime_valunion` union whereas it is actually the `ValRaw` Rust type (represented by `wasmtime_val_raw`) that is affected by the change.

github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. labels Apr 14, 2022

fitzgen approved these changes Apr 14, 2022

View reviewed changes

Track down some more endianness conversions

f266734

bjorn3 reviewed Apr 14, 2022

View reviewed changes

alexcrichton merged commit 51d82ae into bytecodealliance:main Apr 14, 2022

alexcrichton deleted the val-raw-little-endian branch April 14, 2022 18:09

This was referenced Nov 19, 2022

Move the endianness notes introduced with #4035 to wasmtime_val_raw #5303

Merged

Use unchecked function callbacks for better performance bytecodealliance/wasmtime-dotnet#186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store the `ValRaw` type in little-endian format #4035

Store the `ValRaw` type in little-endian format #4035

alexcrichton commented Apr 14, 2022

github-actions bot commented Apr 14, 2022

bjorn3 Apr 14, 2022

alexcrichton Apr 14, 2022

bjorn3 Apr 15, 2022

bjorn3 Apr 14, 2022

alexcrichton Apr 14, 2022

Store the ValRaw type in little-endian format #4035

Store the ValRaw type in little-endian format #4035

Conversation

alexcrichton commented Apr 14, 2022

github-actions bot commented Apr 14, 2022

Subscribe to Label Action

bjorn3 Apr 14, 2022

Choose a reason for hiding this comment

alexcrichton Apr 14, 2022

Choose a reason for hiding this comment

bjorn3 Apr 15, 2022

Choose a reason for hiding this comment

bjorn3 Apr 14, 2022

Choose a reason for hiding this comment

alexcrichton Apr 14, 2022

Choose a reason for hiding this comment

Store the `ValRaw` type in little-endian format #4035

Store the `ValRaw` type in little-endian format #4035