Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Wasm ABI treats two-field structs like scalars and doesn't require padding #657

Closed
Manishearth opened this issue Aug 23, 2024 · 13 comments · Fixed by #660
Closed

Rust Wasm ABI treats two-field structs like scalars and doesn't require padding #657

Manishearth opened this issue Aug 23, 2024 · 13 comments · Fixed by #660

Comments

@Manishearth
Copy link
Contributor

Manishearth commented Aug 23, 2024

When single-field structs are passed down over Wasm, they're treated as a single parameter. Multi-field structs are passed down as multiple parameters.

Except not quite: multi-field structs can have padding. A struct (u8, u32) has 3 bytes of padding after the u8, represented in llvm as [3 x i8]. (u16, u32) has two bytes of padding, represented in llvm as [1 x i16].

In the JS/Wasm backend, of course we need to deal with padding when deciphering returned structs (since we read them from memory with their layout). However, we also need to deal with padding when passing structs over Wasm, because each entry of padding is one argument.

This means that in the following setup:

#[repr(C)]
struct Foo {
  a: u8,
  // three bytes of padding
  b: u32,
  c: u32,
}

extern "C" fn foo(x: Foo) {}

foo() gets called as foo(a, 0, 0, 0, b, c).

We already have this behavior implemented in #643, except currently it assumes all padding bytes are i8, whereas it appears that they match the lowest alignment of the previous unaligned section. Or something like that ((u8, u8, u32) and (u16, u32) have differently segmented padding sections). This is a separate bug to be fixed (#656).

However, there appears to be strangeness even when dealing with padding. The Wasm tool conventions claim that structs with single scalar fields are passed as scalars, and structs with multiple are passed as aggregates, however in the following code:

#[repr(C)]
pub struct Scalar {
    first: u32,
}

#[repr(C)]
pub struct Small {
  a: u8,
  // 3 bytes padding
  b: u32
}

#[derive(Debug, Clone, Copy)]
#[repr(C)]
pub struct Big {
    a: u8,
    b: bool,
    c: u8,
    // 1 byte padding
    d: u32,
    e: i32,
}

#[no_mangle]
pub extern "C" fn scalar(x: Scalar) {}
#[no_mangle]
pub extern "C" fn small(x: Small) {}
#[no_mangle]
pub extern "C" fn big(x: Big) {}

the resultant LLVM IR looks something like this:

%Scalar = type { i32 }
%Big = type { i8, i8, i8, [1 x i8], i32, i32 }

define dso_local void @scalar(%Scalar %0) unnamed_addr #0 {...}

define dso_local void @small(i8 %x.0, i32 %x.1) unnamed_addr #0 {...}

define dso_local void @big(%Big %0) unnamed_addr #0 {...}

where Small is split into two scalar parameters in the function. The function small() in WASM is correctly invoked as small(a, b), with no padding.

With a bunch of experimentation it seems like this consistently happens for structs that (transitively) contain two scalar fields. The sizes of the fields are irrelevant, but adding a third field always turns it into a separate type {...} thing that may contain padding.

We should figure out where this heuristic is coming from, and either get it fixed (it doesn't match the documented conventions) or write code to handle it. This may unfortunately require making _intoFFI() conditional on cases with and without padding, but that won't be too hard.

@Manishearth
Copy link
Contributor Author

This same problem does not occur for the equivalent C++ code: https://godbolt.org/z/eda34jq15, though for whatever reason the C++ code also doesn't seem to handle padding at all.

@Manishearth
Copy link
Contributor Author

wasm_bindgen does not appear to support passing structs over FFI directly, it always handles them similar to opaques

@Manishearth
Copy link
Contributor Author

wasm_bindgen doesn't even do layout calculations, it generates getter/setter methods for each field. huh

@Manishearth
Copy link
Contributor Author

@Manishearth
Copy link
Contributor Author

Manishearth commented Aug 23, 2024

Some potentially related issues: rust-lang/rust#115666, rust-lang/rust#81386, rust-lang/rust#81388

Similar issue: rust-lang/rust#121408

@ambiguousname
Copy link
Member

Found the culprit, I think

https://github.com/rust-lang/rust/blob/a60a9e567a7319b33619f6551dc29522c6f58687/compiler/rustc_target/src/abi/call/mod.rs#L48-L51

So the reason this happens would be for say, tuples of exactly two elements?

@Manishearth
Copy link
Contributor Author

yeah, I think the Rust ABI has special handling for scalar pairs and for whatever reason that gets enabled for Wasm as well

@Manishearth
Copy link
Contributor Author

I think this is enough for us to be able to assume that Option, at least, is not affected by this, which makes my life easier though it doesn't fix the problem.

@Manishearth
Copy link
Contributor Author

Note: the Rust behavior doesn't match the tool conventions.

@Manishearth
Copy link
Contributor Author

Upstream bug filed at rust-lang/rust#129486 . It is most likely to lead to better documentation.

@Manishearth
Copy link
Contributor Author

Manishearth commented Aug 23, 2024

Looks like they're working on matching the tool conventions in rust-lang/rust#129486

The flag is -Zwasm-c-abi=spec. This makes all non-scalar structs get passed as pointers, which complicates our codegen further (but also simplifies it in other ways), I don't want to switch to an unstable option right now but we can eventually switch over once that stabilizes. In the meantime, let's figure out and implement the legacy ABI

@Manishearth
Copy link
Contributor Author

In the long run rustc plans to move to -Zwasm-c-abi=spec . The codegen now has to unconditionally allocate and write to the struct on the WASM side (ugh) however it has the benefit of no longer having to muck with "argument counts", greatly simplifying codegen for things like Option/etc.

For now I'm going to stick to the legacy mode but sometime soon we should move codegen over.

Manishearth added a commit to Manishearth/diplomat that referenced this issue Aug 27, 2024
This fixes rust-diplomat#656 but not rust-diplomat#657

The scalarpair struct part of the second test happens to pass since rust-diplomat#657
is only relevant for argument passing
Manishearth added a commit to Manishearth/diplomat that referenced this issue Aug 27, 2024
@Manishearth
Copy link
Contributor Author

Filed #661 for the long term plan. #661 fixes this in the short term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants