diff --git a/book/src/ideas.md b/book/src/ideas.md index a218fa1..83686ce 100644 --- a/book/src/ideas.md +++ b/book/src/ideas.md @@ -31,22 +31,3 @@ When decoding a value, it may contain new fields and enum variants that are not The same can happen the other way around. For example, if the data was saved in some form of storage and the schema evolved in the meantime, the decoder might encounter old data that lacks the newer content. In both cases, the schema must be able to handle missing or unknown fields. Several rules must be upheld when updating a schema, to ensure it is both forward and backward compatible. - -### Skip fields without knowing the exact type - -This section explains how a decoder is able to process payloads that contain newer or unknown fields, given these were introduced in a backward compatible way. - -Without the new schema it's not possible to make decisions about the data that follows after a field identifier. To work around this, reduced information can be encoded into the identifier. - -Only a few details are important for the decoder to proceed, not needing full type information: - -- Is the value a variable integer? - - Skip over individual bytes until the end marker is found -- Is the value length delimited? - - Parse the delimiter, which is always a _varint_, and skip over the length. -- Is the value a nested struct or enum? - - Step into the nested type and skip over all its fields. -- Is the value of fixed length? - - Skip over the fixed length of 1 (`bool`, `u8` and `i8`), 4 (`f32`) or 8 (`f64`) bytes. - -Furthermore, this information is only needed for direct elements of a struct or enum variant, as this allows to skip over the whole field. Types nested into another, like a `vec` for example, don't need to provide this information for each element again. diff --git a/book/src/reference/schema/index.md b/book/src/reference/schema/index.md index 68a8ce8..a678d12 100644 --- a/book/src/reference/schema/index.md +++ b/book/src/reference/schema/index.md @@ -148,7 +148,41 @@ Byte arrays are mutable in other languages as well, but they don't have a reason Identifier are an integral part of schemas and are attached to named and unnamed fields inside a struct or enum. -As the wire format doesn't contain any field names, fields have to be identified in some way. This is done by identifiers, which are [varint](../wire-format#varint-encoding) encoded integers. +As the wire format doesn't contain any field names, fields have to be identified in some way. This is done by identifiers, which are [Varint](../wire-format#varint-encoding) encoded **32-bit unsigned integers**. + +Depending on the type of identifier (field or variant), they might carry some additional information. This is further explained in the [Wire Format](../wire-format). + +### Deriving identifiers + +Similar to classic enums in most languages, the identifiers can be omitted. In that case the compiler derives the identifiers automatically. This feature can be combined to mix and match explicit identifiers with derived ones. + +Whenever an integer is is explicitly derived, it becomes the source for deriving the next potentially following identifier. After all, it's just an integer counter. + +::: info +Identifiers don't have to be strictly increasing. They can appear in any order, can jump from different ranges like `1, 100, 5, 200, ...` + +The only requirement is that they are unique within a struct or enum variant (for fields), and unique within an enum (for variants). +::: + +For example, the following schema applies a mix of explicitly defined and derived identifiers on a single struct: + +```mabo +struct Sample { + field1: u32, + field2: u32 @100, + field3: u32, + field4: u32 @10, + field5: u32, +} +``` + +The final identifiers are as follows: + +- field1: `1` as the minimum identifier is one. +- field2: `100` because it's explicitly defined. +- field3: `101` the next value after 100. +- field4: `10` explicitly defined again. +- field5: `11` the next value after 10. ## Naming diff --git a/book/src/reference/wire-format.md b/book/src/reference/wire-format.md index 782ac35..ba310e2 100644 --- a/book/src/reference/wire-format.md +++ b/book/src/reference/wire-format.md @@ -55,7 +55,7 @@ Both tuples and arrays have a known length as defined in the schema. Therefore, ## Identifiers -Identifiers are an essential part of the format. They mark the start of a field or enum variant and decribe which one it is, so the decoder knows how to parse the following data and assign it to the right element of a struct or enum. +Identifiers are an essential part of the format. They mark the start of a field or enum variant and describe which one it is, so the decoder knows how to parse the following data and assign it to the right element of a struct or enum. These IDs are regular **32-bit unsigned integers**, and may encode additional information together with field or variant number. @@ -75,8 +75,18 @@ This encoding marker is placed in the first 3 bits and the field number in shift It means the maximum possible field number is **229 - 1** (**536,870,911**) instead of the integer types maximum of **232 - 1** (**4,294,967,295**). This amount is still sufficient and very unlikely to ever be reached as it is not considered realistic to have a struct or enum variant with that many fields. +The possible encodings are: + +- `0`/`b000` Variable integer: Skip over individual bytes until the end marker is found. +- `1`/`b001` Length delimited: Parse the delimiter, which is always a _varint_, and skip over the length. +- Is the value a nested struct or enum? + - Step into the nested type and skip over all its fields. +- `2`/`b010` Fixed 1-byte length: Skip over the fixed length of 1 byte (`bool`, `u8` and `i8`). +- `3`/`b011` Fixed 4-byte length: Skip over the fixed length of 4 bytes (`f32`). +- `4`/`b100` Fixed 8-byte length: Skip over the fixed length of 8 bytes (`f64`). + ### Variant identifiers The variant identifiers currently don't carry any additional information and encode the the number as is. -Therefore the current maximum possible variant number is **232 - 1** (**4,294,967,295**), although unlikely to ever be reached when using sequential numbers without gaps. \ No newline at end of file +Therefore the current maximum possible variant number is **232 - 1** (**4,294,967,295**), although unlikely to ever be reached when using sequential numbers without gaps.