Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vectors for calls and imports #145

Merged
merged 3 commits into from
Aug 6, 2020
Merged

Use vectors for calls and imports #145

merged 3 commits into from
Aug 6, 2020

Conversation

rossberg
Copy link
Member

Return to using vectors for function call arguments and results, as well as module imports. This was simplified to a straight array interface in #48, but additional sanity checks have been requested.

Unlike #134, this avoids the use of struct-by-value in the C API and implements the API change consistently in both C and C++ API.

Copy link
Collaborator

@jakobkummerow jakobkummerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

(Sorry for the latency.)

FWIW, I'm still unconvinced that these changes provide any tangible benefit; but if one accepts the premise that this is what people want, then the changes look fine.

@rossberg
Copy link
Member Author

@jakobkummerow, PTAL. I introduced a few convenience macros to make manual handling of vectors (and values) less verbose. Also, fixed latent crash bugs in examples where out val_vecs weren't properly initialised (which might cause attempts to release an own reference if a val happens to materialise with ref kind).

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that passing struct containing a pointer and length, instead of just passing the pointer and length, is valuable. A plain pointer + length pair is simpler, more idiomatic in C APIs, and easier to bind to.

For example, witx tooling already knows how to map a high-level Interface-Types-like array type into separate pointer and length arguments when generating a C binding, and it's a simple technique that generates idiomatic APIs in a principled way. It does mean that you can't infer the full high-level semantics from the C API signatures alone, however that's already true for other reasons, and it doesn't have any obvious practical downsides.

include/wasm.h Outdated
@@ -510,7 +510,7 @@ WASM_API_EXTERN const wasm_memory_t* wasm_extern_as_memory_const(const wasm_exte
WASM_DECLARE_REF(instance)

WASM_API_EXTERN own wasm_instance_t* wasm_instance_new(
wasm_store_t*, const wasm_module_t*, const wasm_extern_t* const imports[],
wasm_store_t*, const wasm_module_t*, own wasm_extern_vec_t* imports,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from a const T* to an own T*? If I understand the code here, this function isn't expected to take ownership of the array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! This was because no other places needed non-own vectors before, so I hadn't noticed it and supporting it required extra infrastructure. Fixed now.

@rossberg
Copy link
Member Author

rossberg commented Jul 1, 2020

It's not clear to me that passing struct containing a pointer and length, instead of just passing the pointer and length, is valuable. A plain pointer + length pair is simpler, more idiomatic in C APIs, and easier to bind to.

It's a question of consistency. The API uses the vec abstraction in all other places, so it would be rather incoherent to not use it in this one. It also is more self-documenting. I haven't tried, but I would expect that bindings benefit from consistency as well.

(You could argue that we shouldn't use vec anywhere except for return types, but I'm not sure that's preferable, especially on the C++ end. In any case, that would be a different and much larger change.)

@rossberg
Copy link
Member Author

rossberg commented Jul 1, 2020

@jakobkummerow, PTAL.

@alexcrichton
Copy link

I think in terms of ownership the API right now is pretty inconsistent, and I think that @sunfishcode's suggestion could greatly improve the consistency and understandability of the API at-a-glance. For example I think it would be more understandable if vectors were only used as either return values or "here's ownership of an allocated chunk of memory". Currently vectors are sometimes used as ownership containers and sometimes used as "here's a temporary view into my data". Additionally the ownership itself is pretty:

  • wasm_functype_new - takes ownership of its parameter/results lists
  • wasm_functype_{params,results} - returns a vector, but not an owned vector
  • wasm_trap_trace - returns an owned vector
  • wasm_module_{new,validate} - takes a vector, but does not take ownership
  • wasm_module_{imports,exports} - returns an owned vector
  • wasm_instance_exports - returns an owned vector
  • wasm_trap_new - vector argument, does not take ownership
  • wasm_trap_message - returns a vector
  • wasm_{import,export}type_new - takes ownership of its module/name arguments
  • wasm_importtype_module - returns a vector, does not yield ownership
  • wasm_{import,export}type_name - returns a vector, does not yield ownership

Given this state the only real way to figure out the ownership of an API seems to be to look at the documentation, from the signature alone it's pretty unclear where ownership is being transferred. For example it seems like a weird gotcha that wasm_trap_new doesn't take ownership of its string argument while wasm_importtype_new does.

I think it could be more consistent to say "using a vector means transfer of ownership". That way if you ever see a *_vec_t (or alias with wasm_name_t or wasm_message_t) in the API you know that an ownership transfer is happening. Otherwise if you see a pointer/length then the standard C idiom of "don't manage the memory you don't own" kicks into play and it's clear you don't need to deallocate it, nor is it deallocated for you if you pass it to an API.

@alexcrichton
Copy link

I would expect that bindings benefit from consistency as well

FWIW I can say that at least from my experience consistency matters a lot with respect "what does everyone else in C do" just as much as within the API itself. For example to bind wasm_trap_new in Go you in theory just need to pass in a pointer/length, but due to pointer rules in Go I'm required to take a string argument, duplicate it as a CString (copy the memory), then pass that pointer to wasm_trap_new which likely copies the data again. All that really needs to happen here is passing a pointer/length, but requiring the API to go through a wasm_byte_vec_t adds an extra hoop to jump through which requires other allocations.

@rossberg
Copy link
Member Author

rossberg commented Jul 1, 2020

I see what you mean. In most cases, where ownership transfer occurs was driven by what information an object needs to keep around, and what it is more likely to reconstruct from scratch. In some cases this may be biased towards what was most natural for the V8 backend.

However, "using a vector means transfer of ownership" may not be a sufficiently nuanced guiding principle, since you sometimes just want to return a pair of vector and length without passing ownership. That pairing was the primary purpose of the abstraction after all. Always passing ownership would either force premature copying, or extra levels of reference counting and indirection. The type representations are an example of this: e.g., just traversing the arguments of a function type should not require copying the vector and its types (which may get more complex in the future).

Ownership ought to be more apparent in the C++ version, where we at least have types to represent it. Though I believe it is still explicit from the own macro in the C version. That was the intent anyway. Can you elaborate where it is not?

Happy to discuss how to make this more consistent. Heads-up though that I'm off to vacation tomorrow, so probably won't reply during the next week.

@alexcrichton
Copy link

alexcrichton commented Jul 1, 2020

Ah yes to be clear the C API, as is today, is clear about where ownership is transferred. The type signatures (and the own token) are sufficient for this. Additionally I agree that higher level languages like C++ (and, for example, a Rust API) clearly define ownership through their types. My point, though, is that the ownership of arguments/results are pretty inconsistent throughout the C API. Not how they're textually defined but how they're semantically defined.

For example why does wasm_module_imports return an owned vector but wasm_functype_params returns a vector owned by the original type? Additionally why does wasm_importtype_new take ownership of its string arguments while wasm_trap_new not take ownership? (there's also a question of why wasm_trap_* works with nul-terminated strings but that's orthogonal to ownership)

I agree that APIs sometimes need to return two values, such as wasm_importtype_module. I think this can be done like in wasm_instance_new, however, where one return value is "blessed" to be returned from the function and another is returned as an out-parameter. My point is that idiomatically and conventionally I think it would make more sense for usage of *_vec_t to imply "ownership is being transferred", while if ownership isn't being transferred a plain old pointer/length is used (either as two parameters or one return and one out-parameter).

An example of this is the libgit2 API where git_buf is used similarly to wasm_byte_vec_t but only when a whole vector is returned. All other APIs that take a list of values take a plain pointer/length pair. This also shows an alternative to returning lists of values as returning iterators of values. For example in Wasmtime to implement wasm_module_exports we don't natively store exports in as a list of wasm_extern_t values, so an iterator would be much more natural there anyway.

Copy link
Collaborator

@jakobkummerow jakobkummerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine. (By which I don't mean to take a position in the ongoing discussion about what the best API design would be.)

@@ -648,8 +648,8 @@ public:
Func() = delete;
~Func();

using callback = auto (*)(const Val[], Val[]) -> own<Trap>;
using callback_with_env = auto (*)(void*, const Val[], Val[]) -> own<Trap>;
using callback = auto (*)(const vec<Val>&, vec<Val>&) -> own<Trap>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, allocating the param/result data structures is highly performance sensitive for fast calls. For maximum optimization, it's important that an engine can materialize the respective objects "by hand", i.e. precisely predict the C++ object layout. That's trivially doable with a Val[] and should still be fine with a const vec<Val>&, so I'm just pointing this out in case further changes will be discussed/considered here.
(FWIW, we haven't gone that far yet in V8, we're still using malloc here, which is fairly slow. C++-side stack allocation of dynamically sized arrays is not an option for us.)

@rossberg
Copy link
Member Author

rossberg commented Aug 6, 2020

I created a issue #150 for the ownership discussion. Landing this patch for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants