doc(book): describe details about derived identifiers

Explain that it is possible to automatically derive identifiers instead of always explicitly defining them. Also, move some encoding details from the ideas section to the wire format as those are already implemented.
dnaka91 · Jan 9, 2024 · 616dfbf · 616dfbf
1 parent d27361e
commit 616dfbf
Show file tree

Hide file tree

Showing 3 changed files with 47 additions and 22 deletions.
diff --git a/book/src/ideas.md b/book/src/ideas.md
@@ -31,22 +31,3 @@ When decoding a value, it may contain new fields and enum variants that are not
 The same can happen the other way around. For example, if the data was saved in some form of storage and the schema evolved in the meantime, the decoder might encounter old data that lacks the newer content.
 
 In both cases, the schema must be able to handle missing or unknown fields. Several rules must be upheld when updating a schema, to ensure it is both forward and backward compatible.
-
-### Skip fields without knowing the exact type
-
-This section explains how a decoder is able to process payloads that contain newer or unknown fields, given these were introduced in a backward compatible way.
-
-Without the new schema it's not possible to make decisions about the data that follows after a field identifier. To work around this, reduced information can be encoded into the identifier.
-
-Only a few  details are important for the decoder to proceed, not needing full type information:
-
-- Is the value a variable integer?
-  - Skip over individual bytes until the end marker is found
-- Is the value length delimited?
-  - Parse the delimiter, which is always a _varint_, and skip over the length.
-- Is the value a nested struct or enum?
-  - Step into the nested type and skip over all its fields.
-- Is the value of fixed length?
-  - Skip over the fixed length of 1 (`bool`, `u8` and `i8`), 4 (`f32`) or 8 (`f64`) bytes.
-
-Furthermore, this information is only needed for direct elements of a struct or enum variant, as this allows to skip over the whole field. Types nested into another, like a `vec<u32>` for example, don't need to provide this information for each element again.
diff --git a/book/src/reference/schema/index.md b/book/src/reference/schema/index.md
@@ -148,7 +148,41 @@ Byte arrays are mutable in other languages as well, but they don't have a reason
 
 Identifier are an integral part of schemas and are attached to named and unnamed fields inside a struct or enum.
 
-As the wire format doesn't contain any field names, fields have to be identified in some way. This is done by identifiers, which are [varint](../wire-format#varint-encoding) encoded integers.
+As the wire format doesn't contain any field names, fields have to be identified in some way. This is done by identifiers, which are [Varint](../wire-format#varint-encoding) encoded **32-bit unsigned integers**.
+
+Depending on the type of identifier (field or variant), they might carry some additional information. This is further explained in the [Wire Format](../wire-format).
+
+### Deriving identifiers
+
+Similar to classic enums in most languages, the identifiers can be omitted. In that case the compiler derives the identifiers automatically. This feature can be combined to mix and match explicit identifiers with derived ones.
+
+Whenever an integer is is explicitly derived, it becomes the source for deriving the next potentially following identifier. After all, it's just an integer counter.
+
+::: info
+Identifiers don't have to be strictly increasing. They can appear in any order, can jump from different ranges like `1, 100, 5, 200, ...`
+
+The only requirement is that they are unique within a struct or enum variant (for fields), and unique within an enum (for variants).
+:::
+
+For example, the following schema applies a mix of explicitly defined and derived identifiers on a single struct:
+
+```mabo
+struct Sample {
+  field1: u32,
+  field2: u32 @100,
+  field3: u32,
+  field4: u32 @10,
+  field5: u32,
+}
+```
+
+The final identifiers are as follows:
+
+- field1: `1` as the minimum identifier is one.
+- field2: `100` because it's explicitly defined.
+- field3: `101` the next value after 100.
+- field4: `10` explicitly defined again.
+- field5: `11` the next value after 10.
 
 ## Naming
 

diff --git a/book/src/reference/wire-format.md b/book/src/reference/wire-format.md
@@ -55,7 +55,7 @@ Both tuples and arrays have a known length as defined in the schema. Therefore,
 
 ## Identifiers
 
-Identifiers are an essential part of the format. They mark the start of a field or enum variant and decribe which one it is, so the decoder knows how to parse the following data and assign it to the right element of a struct or enum.
+Identifiers are an essential part of the format. They mark the start of a field or enum variant and describe which one it is, so the decoder knows how to parse the following data and assign it to the right element of a struct or enum.
 
 These IDs are regular **32-bit unsigned integers**, and may encode additional information together with field or variant number.
 
@@ -75,8 +75,18 @@ This encoding marker is placed in the first 3 bits and the field number in shift
 
 It means the maximum possible field number is **2<sup>29</sup> - 1** (**536,870,911**) instead of the integer types maximum of **2<sup>32</sup> - 1** (**4,294,967,295**). This amount is still sufficient and very unlikely to ever be reached as it is not considered realistic to have a struct or enum variant with that many fields.
 
+The possible encodings are:
+
+- `0`/`b000` Variable integer: Skip over individual bytes until the end marker is found.
+- `1`/`b001` Length delimited: Parse the delimiter, which is always a _varint_, and skip over the length.
+- Is the value a nested struct or enum?
+  - Step into the nested type and skip over all its fields.
+- `2`/`b010` Fixed 1-byte length: Skip over the fixed length of 1 byte (`bool`, `u8` and `i8`).
+- `3`/`b011` Fixed 4-byte length: Skip over the fixed length of 4 bytes (`f32`).
+- `4`/`b100` Fixed 8-byte length: Skip over the fixed length of 8 bytes (`f64`).
+
 ### Variant identifiers
 
 The variant identifiers currently don't carry any additional information and encode the the number as is.
 
-Therefore the current maximum possible variant number is **2<sup>32</sup> - 1** (**4,294,967,295**), although unlikely to ever be reached when using sequential numbers without gaps.
+Therefore the current maximum possible variant number is **2<sup>32</sup> - 1** (**4,294,967,295**), although unlikely to ever be reached when using sequential numbers without gaps.