Index spaces feel inconsistent about what they might or might not contain #11

alexcrichton · 2022-03-29T19:43:42Z

Personally I get confused about the various index spaces that are introduced in the component model:

Type index space - this includes all interface types, component types, module types, etc. This, however, cannot name module instance types.
Module index space - this includes only defined core wasm modules
Component index space - this includes only defined components
Function index space - this includes all of core wasm functions, lifted functions, and lowered functions
Instance index space - this includes both module instances and component instances

Personally I keep getting lost about which index space includes what. For example just now I wasn't sure if modules and components were in the same index space because their instances are in the same index space.

Would it be possible to separate these index spaces? For example could the function and index spaces be split between module/component types?

fitzgen · 2022-03-29T20:34:20Z

+1 this is something I've repeatedly run up against as well.

It seems like we might as well "use the type system" a little bit to ensure that e.g. a function we are lifting is actually a core function and not a component function.

lukewagner · 2022-03-30T01:08:09Z

I think you're right that the index space division is rather ad hoc; it's attempting to balance a few competing goals and compromising as a result, but maybe it's not the right compromise.

So there's a pretty clear core wasm convention (which, if we're embedding core wasm into components, I think we should maintain) that the sort in (sort <idx>) or (sort <dfn>) selects the index space (where sort is one of type, func, etc). Thus, if we had, e.g., four different index spaces for {core,component}x{function,instance}, we'd need four distinct sorts, e.g.: (core func <idx>|<dfn>)/(core instance <idx>|<dfn>)/(func <idx>/<dfn>)/(instance <idx>/<dfn>). Originally, this seemed like unnecessary extra verbosity to me (given that we have to validate the result anyway), but I'm interested to discuss this option more if we think this partitioning of index spaces is worth it.

The big irregularity with the current scheme that you mentioned that bugs me too is that components and modules are in different index spaces; if they followed the function/instance convention, they'd share a single index space. My main hangup here is that if we put both components/modules into the same index space, we'd have to pick a single shared X for which we'd write (X $id) to refer to both modules and components (and then maybe (X <def>) to define either one?!), and I can't think of a good new X and reusing module or component seems like it would create more confusion.

Finally: the type index space doesn't contain module instance types only because there's no need for them at the moment. If the need arose, there's no reason we couldn't add them (and/or core function types as well).

alexcrichton · 2022-03-30T14:11:33Z

I agree with your thinking, although I don't know how best to solve this. In implementing this it feels inconsistent that modules/components are separate but their instances and functions are not, so I just wanted to raise the issue.

lukewagner · 2022-04-05T18:18:49Z

Ok, having had a bit more time to think about this and run the idea by more folks, I think it's a good idea to be more regular here and split the function, instance and type index spaces into component-vs-core.

I think there could be also be a nice organizing principle that we could adopt: if hopefully-maybe one day module linking is added to core wasm (as originally proposed), it would be nice if the binary encoding and validation logic for core {module, instance, type} sections could be shared between the component and core layers. Thus, a design criteria for these sections would be that they would also make sense when embedded in a core module (and thus probably not look too different than the original proposed binary format).

Core functions are the outlier here: (func (canon.lower ...)) defines a core function, which goes into a core function index space, but canon.lower is a fundamentally component-layer definition (it refers to Interface Types). Post-component-MVP, adapter function definitions would similarly only make sense at the component layer. But I think this is fine: we can have core index spaces populated by component-layer definitions (along with core-layer definitions, projected from core instances via alias).

The big remaining question is how to express this split syntactically. I need to play with this a bit more before posting a proposed new syntax, but I'm happy to hear any ideas on this. I just wanted to give an update in the meantime (I need to finish off the canonical ABI PR before digging into this).

lukewagner · 2022-04-14T23:19:20Z

Ok, I kinda like this idea of pretending that module-linking has already been added to core wasm; it gives us a nice regular scheme for syntactically and binarily separating out component-level and core-level index spaces:

Any core definition or use (so: modules, module instances, core functions, core types, and aliases to any of these) gets prefixed with core as the first token inside the parens (e.g., (core module ...))
- The core token says: interpret the rest of this s-expression as core wasm text format (where we're pretending that core wasm has module-linking and thus can express module definitions etc).
- All the definitions/uses inside a core-prefixed s-expression don't need (and can't have) a core prefix, because they are being parsed as core syntax.
Everything else inside a (component ...) (that is not inside a (core ...)) is parsed as a component-level definition which includes components, component instances, component functions, component types, and aliases to any of these.
As always, the index space appended to by imports is determined by the type of the import so there is no (core import ...) (only (import "x" (core ...))). Similarly, there is no (core export ...) (only (export "x" (core ...))).

So, as an example, I would write:

(component
  (import "foo" (instance $foo  ;; creates a component instance type definition
    (export "bar" (func (param string)))
  ))
  (alias export $foo "bar" (func $foo_bar))  ;; component-level alias
  (import "libc" (core module $Libc  ;; creates a core module type definition
    (export "memory" (memory 1))
    (export "realloc" (func (param i32 i32) (result i32)))
  ))
  (core instance $libc (instantiate $Libc))  ;; note: no (module $Libc) needed
  (core alias export $libc "mem" (memory $mem))  ;; core-level alias adds core memory index
  (core alias export $libc "realloc" (func $realloc))  ;; core-level alias adds core function index
  (core $foo_bar_ (canon.lower  ;; see comment below
      (memory $mem) (realloc $realloc)  ;; indexes core index spaces b/c canon.lower says so
      $foo_bar  ;; indexes the component function index space b/c canon.lower says so
  ))
  (core module $Main  ;; all core definitions inside here
    (import "libc" "memory" (memory 1))
    (import "libc" "realloc" (func (param i32 i32) (result i32)))
    (import "foo" "bar" (func (param i32 i32)))
    (func (export "baz") ...)
  )
  (core instance $main (instantiate $Main
    (with "libc" (instance $libc))
    (with "foo" (instance (export "bar" (func $foo_bar_))))
  ))
  (core alias export $main "baz" (func $main_baz_))
  (func $baz_run (canon.lift
    (func (param string))
    (memory $mem) (realloc $realloc)  ;; indexes core index spaces b/c canon.lower says so
    $main_baz_  ;; indexes the core function index space b/c canon.lower says so
  ))
  (export "baz" (func $baz_run))  ;; indexes component function index space
)

So the only tricky thing here is canon.lower where we want to create a core function (and inject it into the core function index space), but canon.lower is not in the core text format (and won't ever be; it depends on interface types). So I think what we need to do here is: just like how imports and aliases are definitions that don't have their own index spaces but, rather, inject into other index spaces, we say that canon.lift/canon.lower are new top-level definitions that inject functions into the component or core function index spaces, resp:

definition ::= ...
             | <canonical>
canonical ::=
            | (canon.lift <functype> <canonopt>* <core:funcidx> (func $id))
            | (canon.lower <canonopt>* <funcidx> (core func $id))

But then, just like we have the following syntactic abbreviations in Explainer.md:

(func $f (import "i" "f")) ≡ (import "i" "f" (func $f))
(func $g (alias $i "g1")) ≡ (alias $i "g1" (func $g))

we could also have:

(func $h (canon.lift <canonopt>* <functype>* <core:funcidx>))
    ≡ (canon.lift <canonopt>* <functype>* <core:funcidx> (func $h))
(core func $i (canon.lower <canonopt>* <funcidx>))
    ≡ (canon.lower <canonopt>* <funcidx> (core func $i))

and this is what I used in the code sample above. However, I don't know if this abbreviation is too magical. You could make the case that the rewrite rule happens in the component layer before the rewritten text (the right-hand side) is handed down to the core text parser. 🤷

WDYT?

alexcrichton · 2022-04-15T14:19:53Z

Prefixing things with core seems like a good idea to me, but I'm not sure I understand the reasoning for a top-level (canon.lift ...) item. At the binary layer it's still the function section so why not have the (func (canon.lift ...)) be the only way to define it?

lukewagner · 2022-04-15T17:14:01Z

So my thinking here is that, when you have a core X, we're literally just "#including" the Core WebAssembly spec definition of X (in a hypothetical future where Core WebAssembly has module linking as originally proposed). Thus, for a (core func ...), you'd expect the ... to only contain a valid core wasm function body and for this to be encoded in the binary format as an existing core function/code section. So the idea with having canon.lift/canon.lower be the top-level definition is that they would not be claiming to be core and thus they'd be justified in having their own non-core text/binary format. Then, in the binary format, I was thinking we'd rename what's currently called the func section to the called the canon section (but otherwise keep the contents the same). (To be super regular, then, it might be a good idea to break the single token canon.lift into two tokens canon and lift; so the text/binary correspondence was 1:1. This would be symmetric to how we have (alias export ...) not (alias.export ...).)

alexcrichton · 2022-04-15T17:37:14Z

Seems reasonable to me!

lukewagner · 2022-06-02T15:49:20Z

This should be fixed by #29.

lukewagner mentioned this issue Apr 19, 2022

Fill in CanonicalABI.md #23

Merged

lukewagner mentioned this issue May 2, 2022

Split core definitions out into separate index spaces #29

Merged

lukewagner closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index spaces feel inconsistent about what they might or might not contain #11

Index spaces feel inconsistent about what they might or might not contain #11

alexcrichton commented Mar 29, 2022

fitzgen commented Mar 29, 2022

lukewagner commented Mar 30, 2022

alexcrichton commented Mar 30, 2022

lukewagner commented Apr 5, 2022

lukewagner commented Apr 14, 2022

alexcrichton commented Apr 15, 2022

lukewagner commented Apr 15, 2022

alexcrichton commented Apr 15, 2022

lukewagner commented Jun 2, 2022

Index spaces feel inconsistent about what they might or might not contain #11

Index spaces feel inconsistent about what they might or might not contain #11

Comments

alexcrichton commented Mar 29, 2022

fitzgen commented Mar 29, 2022

lukewagner commented Mar 30, 2022

alexcrichton commented Mar 30, 2022

lukewagner commented Apr 5, 2022

lukewagner commented Apr 14, 2022

alexcrichton commented Apr 15, 2022

lukewagner commented Apr 15, 2022

alexcrichton commented Apr 15, 2022

lukewagner commented Jun 2, 2022