diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index d7f09dda..3ceb0f98 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -1,28 +1,518 @@ # The `wit` format -This is intended to document the `wit` format as it exists today. The goal is -to provide an overview to understand what features `wit` files give you and how -they're structured. This isn't intended to be a formal grammar, although it's -expected that one day we'll have a formal grammar for `wit` files. +The Wasm Interface Type (WIT) text format is a developer-friendly format to +describe the imports and exports of a component. The WIT format can be thought +of as an IDL of sorts to describe APIs that are grouped together as +[`interface`s][interfaces] inside of a [`world`][worlds]. WIT text files are +shared between producers of components and consumers of components as a format +for bindings generation in host and guest languages. The WIT text format +additionally provides a developer-friendly way to inspect an existing +component and learn about its imports and exports. + +The WIT text format uses the file extension `wit`, for example `foo.wit` is +a WIT document. The contents of a `*.wit` file must be valid utf-8 bytes. WIT +documents can contain two items at the top-level: [`interface`][interfaces] and +[`world`][worlds]. + +## WIT Interfaces +[interfaces]: #wit-interfaces + +An `interface` in WIT is a collection of [functions] and [types] which +corresponds to an instance in the component model. Interfaces can either be +imported or exported from [worlds] and represent imported and exported +instances. For example this `interface`: -If you're curious to give things a spin try out the [online -demo](https://bytecodealliance.github.io/wit-bindgen/) of `wit-bindgen` where -you can input `wit` on the left and see output of generated bindings for -languages on the right. If you're looking to start you can try out the -"markdown" output mode which generates documentation for the input document on -the left. +```wit +interface host { + log: func(msg: string) +} +``` + +represents an interface called `host` which provides one function, `log`, which +takes a single `string` argument. If this were imported into a component then it +would correspond to: + +```wasm +(component + (import "host" (insance $host + (export "log" (func (param "msg" string))) + )) + ;; ... +) +``` + +An `interface` can contain [`use`][use] statements, [type][types] definitions, +and [function][functions] definitions. For example: + +```wit +interface wasi-fs { + use { errno } from "wasi-types" + + record stat { + ino: u64, + size: u64, + // ... + } + + stat-file: func(path: string) -> result +} +``` + +More information about [`use`][use] and [types] are described below, but this +is an example of a collection of items within an `interface`. All items defined +in an `interface`, including [`use`][use] items, are considered as exports from +the interface. This means that types can further be used from the interface by +other interfaces. An interface has a single namespace which means that none of +the defined names can collide. + +A WIT document can contain any number of interfaces listed at the top-level and +in any order. The WIT validator will ensure that all references between +documents are well-formed and acyclic. + +## WIT Worlds +[worlds]: #wit-worlds + +WIT documents can contain a `world` annotation at the top-level in addition to +[`interface`][interfaces]. A world is a complete description of both imports and +exports of a component. A world can be thought of as an equivalent of a +`component` type in the component model. For example this world: + +```wit +world my-world { + import host: interface { + log: func(param: string) + } + + export run: func() +} +``` + +can be thought of as this component type: + +```wasm +(type $my-world (component + (import "host" (instance + (export "log" (func (param "param" string))) + )) + (export "run" (func)) +)) +``` + +Worlds describe a concrete component and are the basis of bindings generation. A +guest language will use a `world` to determine what functions are imported, what +they're named, and what functions are exported, in addition to their names. + +Worlds can contain any number of imports and exports, and can be either a +function or an interface. + +```wit +world wasi { + import wasi-fs: "wasi-fs" + import wasi-random: "wasi-random" + import wasi-clock: "wasi-clock" + // ... + + export command: func(args: list) +} +``` + +Additionally interfaces can be defined "inline" as a form of sugar for defining +it at the top-level + +```wit +interface out-of-line { + the-function: func() +} + +world your-world { + import out-of-line: out-of-line + // ... is equivalent to ... + import out-of-line2: interface { + the-function: func() + } +} +``` + +The name of the `import` or `export` is the name of the corresponding item in +the final component. This can be different from the [`interface`][interfaces] +name but must be a [valid identifier][identifiers]. + +There can be multiple `world` descriptions in a single WIT document. When +generating bindings from a WIT document one world must be marked as `default` or +an explicitly named world must be chosen: + +```wit +default world my-world { +} +``` + +If no `default` world is specified in the WIT document and no named world is +explicitly chosen then bindings cannot be generated. + +## WIT File Organization +[use]: #wit-file-organization + +WIT files can be organized into separate files for maintainability and +reusability. This enables a sort of module system for WIT syntax where files may +import from one another. + +> **Note**: The precise semantics of imports and how everything maps out is +> still being design. Basic filesystem-based organization works but it's +> intended to extend to URL-based organization in the near future. For example +> the strings below are intended to integrate into a registry-based workflow as +> well in addition to looking up files on the filesystem. + +Within a single WIT file the `use` statement can be used to import between +interfaces: + +```wit +interface types { + enum errno { /* ... */ } + + type size = u32 +} + +interface my-host-functions { + use { errno, size } from types +} +``` + +Here the `from` directive of the `use` is not quoted which means that it's +resolved relative to the WIT file itself. The `interface types` may come either +after or before the `use` directive's `interface`. + +The `use` directive is not allowed to create cycles and must always form an +acyclic graph of dependencies between interfaces. + +Names imported via `use` can be renamed as they're imported as well: + +```wit +interface my-host-functions { + use { errno as my-errno } from types +} +``` + +When importing or exporting an [interface][interfaces] in a [world][worlds] +unquoted names refer to [interfaces] defined elsewhere in the file: + +```wit +world my-world { + import host: host +} + +interface host { + // ... +} +``` + +The target of a `use` or the type of an import/export in a `world` may also be a +quoted string: + +```wit +interface foo { + use { /* ... */ } from "./other-file" +} + +world my-world { + import some-import: "./other-file" + import another-import: "./other-directory/other-file" +} +``` + +This quoted string form indicates that the import is located in a different WIT +file. At this time all quoted strings must start with `./` or `../` and refer to +a file relative to the location of the current WIT file. This will eventually be +expanded to allow something along the lines of `"wasi:fs"` or similar but the +precise semantics here have not been defined. + +The above directives, for example, will import from `./other-file.wit` (or +`./other-file.wit.md` as described [below][markdown]) and +`./other-directory/other-file.wit` (or `./other-directory/other-file.wit.md`). +The `.wit` extension is automatically appended to the lookup path and `.wit.md` +is tested if the `.wit` file doesn't exist. + +Additionally the above directives require that `other-file.wit` contains an +interface marked `default`. Similar to [`default` worlds][worlds] a WIT file may +have multiple [`interface`s][interfaces] inside it and the import must happen +from one of them, so `default` is used to disambiguate: + +```wit +// other-file.wit +default interface foo { + // ... +} + + +// my-file.wit +default interface foo { + use { /* ... */ } from "./other-file" +} +``` + +A `use` directive can also explicitly list the requested interface to select a +non-`default` one: + +```wit +// other-file.wit +interface foo { + // ... +} + + +// my-file.wit +default interface foo { + use { /* ... */ } from foo in "./other-file" +} +``` + +Like before within a WIT file `use` statements must be acyclic between files as +well. + +Splitting a WIT document into multiple files does not have an impact on its +final structure and it's purely a convenience to developers. Resolution of a WIT +document happens as-if everything were contained in one document (modulo +"hygienic" renaming where `use`-with-identifier still only works within one +file). This means that a WIT document can always be represented as a single +large WIT document with everything contained and separate-file organization is +not necessary. + +### Transitive imports and worlds + +A `use` statement is not implemented by copying type information around but +instead retains that it's a reference to a type defined elsewhere. This +representation is plumbed all the way through to the final component, meaning +that `use`d types have an impact on the structure of the final generated +component. + +For example this document: + +```wit +interface shared { + record metadata { + // ... + } +} + +world my-world { + import host: interface { + use { metadata } from shared + + get: func() -> metadata + } +} +``` + +would generate this component : + +```wasm +(component + (import "shared" (instance $shared + (type $metadata (record (; ... ;))) + (export "metadata" (type (eq $metadata))) + )) + (alias export $shared "metadata" (type $metadata)) + (import "host" (instance $host + (export "get" (func (result $metadata))) + )) +) +``` + +Here it can be seen that despite the `world` only listing `host` as an import +the component additionally import a `shared` instance. This is due to the fact +that the `use { ... } from shared` implicitly requires that `shared` is imported +to the component as well. + +Note that the name `"shared"` here is derived from the name of the `interface` +which can also lead to conflicts: + +```wit +// shared1.wit +default interface shared { /* ... */ } + +// shared2.wit +default interface shared { /* ... */ } + +// world.wit +world my-world { + import foo: interface { + use { a-type } from "shared1" + } + import bar: interface { + use { other-type } from "shared2" + } +} +``` + +This is an invalid WIT document due because `my-world` needs to import two +unique interfaces called `shared`. To disambiguate a manual import is required: + +``` +world my-world { + import shared1: "shared1" + import shared2: "shared1" + + import foo: interface { + use { a-type } from "shared1" + } + import bar: interface { + use { other-type } from "shared2" + } +} +``` + +For `export`ed interfaces any transitively `use`d interface is assumed to be an +import unless it's explicitly listed as an export. + +> **Note**: It's planned in the future to have "power user syntax" to configure +> this on a more fine-grained basis for exports, for example being able to +> configure that a `use`'d interface is a particular import or a particular +> export. + +## WIT Functions +[functions]: #wit-functions + +Functions are defined in an [`interface`][interfaces] or are listed as an +`import` or `export` from a [`world`][worlds]. Parameters to a function must all +be named and have unique names: + +```wit +interface foo { + a1: func() + a2: func(x: u32) + a3: func(y: u64, z: float32) +} +``` + +Functions can return at most one unnamed type: + +```wit +interface foo { + a1: func() -> u32 + a2: func() -> string +} +``` + +And functions can also return multiple types by naming them: + +```wit +interface foo { + a: func() -> (a: u32, b: float32) +} +``` + +Note that returning multiple values from a function is not equivalent to +returning a tuple of values from a function. These options are represented +distinctly in the component binary format. + +## WIT Types +[types]: #wit-types + +Types in WIT files can only be defined in [`interface`s][interfaces] at this +time. The types supported in WIT is the same set of types supported in the +component model itself: + +```wit +interface foo { + // "package of named fields" + record r { + a: u32, + b: string, + } + + // values of this type will be one of the specified cases + variant human { + baby, + child(u32), // optional type payload + adult, + } + + // similar to `variant`, but no type payloads + enum errno { + too-big, + too-small, + too-fast, + too-slow, + } + + // similar to `variant`, but doesn't require naming cases and all variants + // have a type payload -- note that this is not a C union, it still has a + // discriminant + union input { + u64, + string, + } + + // a bitflags type + flags permissions { + read, + write, + exec, + } + + // type aliases are allowed to primitive types and additionally here are some + // examples of other types + type t1 = u32 + type t2 = tuple + type t3 = string + type t4 = option + type t5 = result<_, errno> // no "ok" type + type t6 = result // no "err" type + type t7 = result // both types specified + type t8 = result // no "ok" or "err" type + type t9 = list + type t10 = t9 +} +``` + +The `record`, `variant`, `enum`, `union`, and `flags` types must all have names +associated with them. The `list`, `option`, `result`, `tuple`, and primitive +types do not need a name and can be mentioned in any context. This restriction +is in place to assist with code generation in all languages to leverage +language-builtin types where possible while accommodating types that need to be +defined within each language as well. + +## WIT Identifiers +[identifiers]: #wit-identifiers + +Identifiers in WIT documents are required to be valid component identifiers, +meaning that they're "kebab cased". This currently is restricted to ascii +characters and numbers that are `-` separated. + +For more information on this see the [binary format](./Binary.md). + +## WIT in Markdown +[markdown]: #wit-in-markdown + +The WIT text format can also additionally be parsed from markdown files with the +extension `wit.md`, for example `foo.wit.md`: + + # This would be -## Lexical structure + Some markdown text + + ```wit + // interspersed with actual `*.wit` + interface my-interface { + ``` + + ```wit + // which can be broken up between multiple blocks + } + ``` + +Triple-fence code blocks with the `wit` marker will be extracted from a markdown +file and concatenated into a single string which is then parsed as a normal +`*.wit` file. + +# Lexical structure The `wit` format is a curly-braced-based format where whitespace is optional (but -recommended). It is intended to be easily human readable and supports features -like comments, multi-line comments, and custom identifiers. A `wit` document -is parsed as a unicode string, and when stored in a file is expected to be -encoded as UTF-8. +recommended). A `wit` document is parsed as a unicode string, and when stored in +a file is expected to be encoded as utf-8. -Additionally, wit files must not contain any bidirectional override scalar values, -control codes other than newline, carriage return, and horizontal tab, or -codepoints that Unicode officially deprecates or strongly discourages. +Additionally, wit files must not contain any bidirectional override scalar +values, control codes other than newline, carriage return, and horizontal tab, +or codepoints that Unicode officially deprecates or strongly discourages. The current structure of tokens are: @@ -112,6 +602,7 @@ keyword ::= 'use' | 'tuple' | 'future' | 'stream' + | 'default' ``` ## Top-level items @@ -139,7 +630,7 @@ interface example { Specifically interfaces have the structure: ```wit -interface-item ::= 'interface' id strlit? '{' interface-items* '}' +interface-item ::= 'default'? 'interface' id '{' interface-items* '}' interface-items ::= resource-item | variant-items @@ -151,12 +642,13 @@ interface-items ::= resource-item | use-item | func-item -func-item ::= id ':' 'func' param-list '->' result-list +func-item ::= id ':' 'func' param-list result-list param-list ::= '(' named-type-list ')' -result-list ::= ty - | '(' named-type-list ')' +result-list ::= nil + | '->' ty + | '->' '(' named-type-list ')' named-type-list ::= nil | named-type ( ',' named-type )* @@ -172,13 +664,13 @@ wit documents. The structure of a use statement is: ```wit use * from other-file use { a, list, of, names } from another-file -use { name as other-name } from yet-another-file +use { name as other-name } from interface in "yet-another-file" ``` Specifically the structure of this is: ```wit -use-item ::= 'use' use-names 'from' id +use-item ::= 'use' use-names 'from' use-from use-names ::= '*' | '{' use-names-list '}' @@ -188,6 +680,10 @@ use-names-list ::= use-names-item use-names-item ::= id | id 'as' id + +use-from ::= id + | strlit + | id 'in' strlit ``` Note: Here `use-names-list?` means at least one `use-name-list` term. @@ -247,10 +743,9 @@ record-field ::= id ':' ty ### Item: `flags` (bag-of-bools) -A `flags` statement defines a new `record`-like structure where all the fields -are booleans. The `flags` type is distinct from `record` in that it typically is -represented as a bit flags representation in the canonical ABI. For the purposes -of type-checking, however, it's simply syntactic sugar for a record-of-booleans. +A `flags` represents a bitset structure with a name for each bit. The `flags` +type is represented as a bit flags representation in +the canonical ABI. ```wit flags properties { @@ -258,14 +753,6 @@ flags properties { marvel-superhero, supervillan, } - -// type-wise equivalent to: -// -// record properties { -// lego: bool, -// marvel-superhero: bool, -// supervillan: bool, -// } ``` Specifically the structure of this is: @@ -324,16 +811,6 @@ enum color { yellow, other, } - -// type-wise equivalent to: -// -// variant color { -// red, -// green, -// blue, -// yellow, -// other, -// } ``` Specifically the structure of this is: @@ -349,22 +826,14 @@ enum-cases ::= id A `union` statement defines a new type which is semantically equivalent to a `variant` where all of the cases have a payload type and the case names are -numerical. This is special-cased, however, to possibly have a different -representation in the language ABIs or have different bindings generated in for -languages. +numerical. This is special-cased, however, to have a different representation +in the language ABIs or have different bindings generated in for languages. ```wit union configuration { string, list, } - -// type-wise equivalent to: -// -// variant configuration { -// 0(string), -// 1(list), -// } ``` Specifically the structure of this is: @@ -376,42 +845,6 @@ union-cases ::= ty | ty ',' union-cases? ``` -## Item: `resource` - -Resources represent a value that has a hidden representation not known to the -outside world. This means that the resource is operated on through a "handle" (a -pointer of sorts). Resources also have ownership associated with them and -languages will have to manage the lifetime of resources manually (they're -similar to file descriptors). - -Resources can also optionally have functions defined within them which adds an -implicit "self" argument as the first argument to each function of the same type -of the including resource, unless the function is flagged as `static`. - -```wit -resource file-descriptor - -resource request { - static new: func() -> request - - body: func() -> future> - headers: func() -> list -} -``` - -Specifically resources have the structure: - -```wit -resource-item ::= 'resource' id resource-contents - -resource-contents ::= nil - | '{' resource-defs '}' - -resource-defs ::= resource-def resource-defs? - -resource-def ::= 'static'? func-item -``` - ## Types As mentioned previously the intention of `wit` is to allow defining types @@ -438,8 +871,6 @@ ty ::= 'u8' | 'u16' | 'u32' | 'u64' | list | option | result - | future - | stream | id tuple ::= 'tuple' '<' tuple-list '>' @@ -454,14 +885,6 @@ result ::= 'result' '<' ty ',' ty '>' | 'result' '<' '_' ',' ty '>' | 'result' '<' ty '>' | 'result' - -future ::= 'future' '<' ty '>' - | 'future' - -stream ::= 'stream' '<' ty ',' ty '>' - | 'stream' '<' '_' ',' ty '>' - | 'stream' '<' ty '>' - | 'stream' ``` The `tuple` type is semantically equivalent to a `record` with numerical fields, @@ -560,9 +983,122 @@ record bar2 { } ``` -The intention of `wit` is that it maps down to interface types, so the goal of -name resolution is to effectively create the type section of a wasm module using -interface types. The restrictions about self-referential types and such come -from how types can be defined in the interface types section. Additionally -definitions of named types such as `record foo { ... }` are intended to map -roughly to declarations in the type section of new types. +# Binary Format + +In addition to a textual format WIT files can also be encoded to a binary +format. The binary format is itself a WebAssembly component and is chosen to be +a more stable artifact over time than the text format in case text format +updates are required. Additionally being represented by the component binary +format means that it's guaranteed that all features of WIT are representable in +the component model. + +The binary format for a WIT document can also be thought of in a loose sense as +a fully-resolved and indexed representation of a WIT document. Constructs such +as `use` are preserved but all name resolution has been boiled away to index +references. Additionally the transitive import set is all finalized as well. + +An example of the binary format is that this document: + +```wit +interface host { + log: func(arg: string) +} + +world the-world { + import host: host + export run: func() +} +``` + +would correspond to: + +```wasm +(component + (type $wit (instance + (type $host (instance + (export "log" (func (param "arg" string))) + )) + (export "host" (type $host)) + + (type $the-world (component + (import "host" (instance (type $host))) + (export "run" (func)) + )) + (export "the-world" (type $the-world)) + )) + (export "wit" (type $wit)) +) +``` + +Here it can be seen that the entire document itself is represented as an +`instance` type with named exports. Within this `instance` type each export is +either itself an `instance` or a `component` with `instance`s corresponding to +an `interface` in WIT and `component`s corresponding to `world`s. The outer +component has a single export named `wit` which points to the instance type that +describes the document. + +Types defined within a WIT interface are additionally defined within the wasm +`instance`. Additionally `use` of types between interfaces is encoded with +aliases between them. + +```wit +interface shared { + record metadata { + // ... + } +} + +world the-world { + import host: interface { + use { metadata } from shared + + get: func() -> metadata + } + + export db: interface { + use { metadata } from shared + + get: func(data: metadata) -> string + } +} +``` + +would correspond to: + +```wasm +(component + (type $wit (instance + (type $shared (instance + (type $metadata (record (; ... ;))) + (export "metadata" (type (eq $metadata))) + )) + (export $shared "shared" (type $shared)) + (alias export $shared "metadata" (type $outer-metadata)) + + (type $host (instance + (export $metadata "metadata" (type $outer-metadata)) + (export "get" (func (result $metadata))) + )) + (export $host "host" (type $host)) + + (type $db (instance + (export $metadata "metadata" (type $outer-metadata)) + (export "get" (func (param "data" $metadata) (result string))) + )) + (export $db "db" (type $db)) + + (type $the-world (component + (import "shared" (instance (type $shared))) + (import "host" (instance (type $host))) + (export "db" (instance (type $db))) + )) + (export "the-world" (type $the-world)) + )) + (export "wit" (type $wit)) +) +``` + +A `world` in a WIT document describes a concrete component, and is represented +in this binary representation as a `component` type (e.g. `$the-world` above). +Tooling which creates a WebAssembly component from a `world` is expected to +create a component that is a subtype of this type.