feat: adjust encoding to include a field encoding marker

The decoding didn't properly handle unknown field IDs. By introducing a new marker that is combined with the field ID, it allows the decoder to skip over the content of a field and continue properly decoding any element, without knowing the exact detail of each field.
dnaka91 · Dec 23, 2023 · 4dd6ba3 · 4dd6ba3
1 parent 160fbcd
commit 4dd6ba3
Show file tree

Hide file tree

Showing 30 changed files with 1,013 additions and 399 deletions.
diff --git a/book/src/ideas.md b/book/src/ideas.md
@@ -21,3 +21,30 @@ Something that is missing from crates.io is to allow for private registries that
 The schema allows for certain changes that are backwards compatible and do not create a break in the wire format. That means older generated code is still able to read the data despite being created by a newer version of the schema. But the rules are difficult to always keep in mind and mistakes can easily be made.
 
 To avoid accidental breaking changes, it would be nice to have an auto-detection mechanism that compares the current version to a previous one whenever changes are made. The _old_ version could be the current Git HEAD compared to the local changes.
+
+## Schema evolution
+
+When decoding a value, it may contain new fields and enum variants that are not known to the decoder yet. This can happen if the schema changed and but was only updated in one of two parties. For example a server might have introduced new fields to a response, but not all clients have updated to the new schema yet.
+
+The same can happen the other way around. For example, if the data was saved in some form of storage and the schema evolved in the meantime, the decoder might encounter old data that lacks the newer content.
+
+In both cases, the schema must be able to handle missing or unknown fields. Several rules must be upheld when updating a schema, to ensure it is both forward and backward compatible.
+
+### Skip fields without knowing the exact type
+
+This section explains how a decoder is able to process payloads that contain newer or unknown fields, given these were introduced in a backward compatible way.
+
+Without the new schema it's not possible to make decisions about the data that follows after a field identifier. To work around this, reduced information can be encoded into the identifier.
+
+Only a few  details are important for the decoder to proceed, not needing full type information:
+
+- Is the value a variable integer?
+  - Skip over individual bytes until the end marker is found
+- Is the value length delimited?
+  - Parse the delimiter, which is always a _varint_, and skip over the length.
+- Is the value a nested struct or enum?
+  - Step into the nested type and skip over all its fields.
+- Is the value of fixed length?
+  - Skip over the fixed length of 1 (`bool`, `u8` and `i8`), 4 (`f32`) or 8 (`f64`) bytes.
+
+Furthermore, this information is only needed for direct elements of a struct or enum variant, as this allows to skip over the whole field. Types nested into another, like a `vec<u32>` for example, don't need to provide this information for each element again.
diff --git a/crates/stef-build/src/decode.rs b/crates/stef-build/src/decode.rs
@@ -29,10 +29,11 @@ pub(super) fn compile_struct(
             #field_vars
 
             loop {
-                match ::stef::buf::decode_id(r)? {
+                let id = ::stef::buf::decode_id(r)?;
+                match id.value {
                     ::stef::buf::END_MARKER => break,
                     #field_matches
-                    _ => continue,
+                    _ => ::stef::buf::decode_skip(r, id.encoding)?,
                 }
             }
 
@@ -70,7 +71,7 @@ pub(super) fn compile_enum(
         impl #generics ::stef::Decode for #name #generics #generics_where {
             #[allow(clippy::too_many_lines)]
             fn decode(r: &mut impl ::stef::Buf) -> ::stef::buf::Result<Self> {
-                match ::stef::buf::decode_id(r)? {
+                match ::stef::buf::decode_variant_id(r)?.value {
                     #(#variants,)*
                     id => Err(::stef::buf::Error::UnknownVariant(id)),
                 }
@@ -103,10 +104,11 @@ fn compile_variant(
                 #field_vars
 
                 loop {
-                    match ::stef::buf::decode_id(r)? {
+                    let id = ::stef::buf::decode_id(r)?;
+                    match id.value {
                         ::stef::buf::END_MARKER => break,
                         #field_matches
-                        _ => continue,
+                        _ => ::stef::buf::decode_skip(r, id.encoding)?,
                     }
                 }
 
@@ -162,6 +164,7 @@ fn compile_field_matches(opts: &Opts, fields: &Fields<'_>) -> TokenStream {
                         } else {
                             ty
                         },
+                        true,
                     );
 
                     quote! { #id => #name = Some(#ty?) }
@@ -184,6 +187,7 @@ fn compile_field_matches(opts: &Opts, fields: &Fields<'_>) -> TokenStream {
                         } else {
                             ty
                         },
+                        true,
                     );
 
                     quote! { #id => #name = Some(#ty?) }
@@ -253,7 +257,7 @@ fn compile_generics(Generics(types): &Generics<'_>) -> (TokenStream, TokenStream
 }
 
 #[allow(clippy::needless_pass_by_value, clippy::too_many_lines)]
-fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
+fn compile_data_type(opts: &Opts, ty: &Type<'_>, root: bool) -> TokenStream {
     match &ty.value {
         DataType::Bool => quote! { ::stef::buf::decode_bool(r) },
         DataType::U8 => quote! { ::stef::buf::decode_u8(r) },
@@ -274,20 +278,20 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
             BytesType::Bytes => quote! { ::stef::buf::decode_bytes_bytes(r) },
         },
         DataType::Vec(ty) => {
-            let ty = compile_data_type(opts, ty);
+            let ty = compile_data_type(opts, ty, false);
             quote! { ::stef::buf::decode_vec(r, |r| { #ty }) }
         }
         DataType::HashMap(kv) => {
-            let ty_k = compile_data_type(opts, &kv.0);
-            let ty_v = compile_data_type(opts, &kv.1);
+            let ty_k = compile_data_type(opts, &kv.0, false);
+            let ty_v = compile_data_type(opts, &kv.1, false);
             quote! { ::stef::buf::decode_hash_map(r, |r| { #ty_k }, |r| { #ty_v }) }
         }
         DataType::HashSet(ty) => {
-            let ty = compile_data_type(opts, ty);
+            let ty = compile_data_type(opts, ty, false);
             quote! { ::stef::buf::decode_hash_set(r, |r| { #ty }) }
         }
         DataType::Option(ty) => {
-            let ty = compile_data_type(opts, ty);
+            let ty = compile_data_type(opts, ty, false);
             quote! { ::stef::buf::decode_option(r, |r| { #ty }) }
         }
         DataType::NonZero(ty) => match &ty.value {
@@ -313,16 +317,16 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
                 }
             },
             DataType::Vec(ty) => {
-                let ty = compile_data_type(opts, ty);
+                let ty = compile_data_type(opts, ty, false);
                 quote! { ::stef::buf::decode_non_zero_vec(r, |r| { #ty }) }
             }
             DataType::HashMap(kv) => {
-                let ty_k = compile_data_type(opts, &kv.0);
-                let ty_v = compile_data_type(opts, &kv.1);
+                let ty_k = compile_data_type(opts, &kv.0, false);
+                let ty_v = compile_data_type(opts, &kv.1, false);
                 quote! { ::stef::buf::decode_non_zero_hash_map(r, |r| { #ty_k }, |r| { #ty_v }) }
             }
             DataType::HashSet(ty) => {
-                let ty = compile_data_type(opts, ty);
+                let ty = compile_data_type(opts, ty, false);
                 quote! { ::stef::buf::decode_non_zero_hash_set(r, |r| { #ty }) }
             }
             ty => todo!("compiler should catch invalid {ty:?} type"),
@@ -331,13 +335,17 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
         DataType::BoxBytes => quote! { Box::<[u8]>::decode(r) },
         DataType::Tuple(types) => match types.len() {
             2..=12 => {
-                let types = types.iter().map(|ty| compile_data_type(opts, ty));
-                quote! { { Ok::<_, ::stef::buf::Error>((#(#types?,)*)) } }
+                let types = types.iter().map(|ty| compile_data_type(opts, ty, false));
+                let length = root.then_some(quote! { ::stef::buf::decode_u64(r)?; });
+                quote! { {
+                    #length
+                    Ok::<_, ::stef::buf::Error>((#(#types?,)*))
+                } }
             }
             n => todo!("compiler should catch invalid tuple with {n} elements"),
         },
         DataType::Array(ty, _size) => {
-            let ty = compile_data_type(opts, ty);
+            let ty = compile_data_type(opts, ty, false);
             quote! { ::stef::buf::decode_array(r, |r| { #ty }) }
         }
         DataType::External(ExternalType {