Skip to content

Commit

Permalink
feat: adjust encoding to include a field encoding marker
Browse files Browse the repository at this point in the history
The decoding didn't properly handle unknown field IDs. By introducing a
new marker that is combined with the field ID, it allows the decoder to
skip over the content of a field and continue properly decoding any
element, without knowing the exact detail of each field.
  • Loading branch information
dnaka91 committed Dec 23, 2023
1 parent 160fbcd commit 4dd6ba3
Show file tree
Hide file tree
Showing 30 changed files with 1,013 additions and 399 deletions.
27 changes: 27 additions & 0 deletions book/src/ideas.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,30 @@ Something that is missing from crates.io is to allow for private registries that
The schema allows for certain changes that are backwards compatible and do not create a break in the wire format. That means older generated code is still able to read the data despite being created by a newer version of the schema. But the rules are difficult to always keep in mind and mistakes can easily be made.

To avoid accidental breaking changes, it would be nice to have an auto-detection mechanism that compares the current version to a previous one whenever changes are made. The _old_ version could be the current Git HEAD compared to the local changes.

## Schema evolution

When decoding a value, it may contain new fields and enum variants that are not known to the decoder yet. This can happen if the schema changed and but was only updated in one of two parties. For example a server might have introduced new fields to a response, but not all clients have updated to the new schema yet.

The same can happen the other way around. For example, if the data was saved in some form of storage and the schema evolved in the meantime, the decoder might encounter old data that lacks the newer content.

In both cases, the schema must be able to handle missing or unknown fields. Several rules must be upheld when updating a schema, to ensure it is both forward and backward compatible.

### Skip fields without knowing the exact type

This section explains how a decoder is able to process payloads that contain newer or unknown fields, given these were introduced in a backward compatible way.

Without the new schema it's not possible to make decisions about the data that follows after a field identifier. To work around this, reduced information can be encoded into the identifier.

Only a few details are important for the decoder to proceed, not needing full type information:

- Is the value a variable integer?
- Skip over individual bytes until the end marker is found
- Is the value length delimited?
- Parse the delimiter, which is always a _varint_, and skip over the length.
- Is the value a nested struct or enum?
- Step into the nested type and skip over all its fields.
- Is the value of fixed length?
- Skip over the fixed length of 1 (`bool`, `u8` and `i8`), 4 (`f32`) or 8 (`f64`) bytes.

Furthermore, this information is only needed for direct elements of a struct or enum variant, as this allows to skip over the whole field. Types nested into another, like a `vec<u32>` for example, don't need to provide this information for each element again.
44 changes: 26 additions & 18 deletions crates/stef-build/src/decode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,11 @@ pub(super) fn compile_struct(
#field_vars

loop {
match ::stef::buf::decode_id(r)? {
let id = ::stef::buf::decode_id(r)?;
match id.value {
::stef::buf::END_MARKER => break,
#field_matches
_ => continue,
_ => ::stef::buf::decode_skip(r, id.encoding)?,
}
}

Expand Down Expand Up @@ -70,7 +71,7 @@ pub(super) fn compile_enum(
impl #generics ::stef::Decode for #name #generics #generics_where {
#[allow(clippy::too_many_lines)]
fn decode(r: &mut impl ::stef::Buf) -> ::stef::buf::Result<Self> {
match ::stef::buf::decode_id(r)? {
match ::stef::buf::decode_variant_id(r)?.value {
#(#variants,)*
id => Err(::stef::buf::Error::UnknownVariant(id)),
}
Expand Down Expand Up @@ -103,10 +104,11 @@ fn compile_variant(
#field_vars

loop {
match ::stef::buf::decode_id(r)? {
let id = ::stef::buf::decode_id(r)?;
match id.value {
::stef::buf::END_MARKER => break,
#field_matches
_ => continue,
_ => ::stef::buf::decode_skip(r, id.encoding)?,
}
}

Expand Down Expand Up @@ -162,6 +164,7 @@ fn compile_field_matches(opts: &Opts, fields: &Fields<'_>) -> TokenStream {
} else {
ty
},
true,
);

quote! { #id => #name = Some(#ty?) }
Expand All @@ -184,6 +187,7 @@ fn compile_field_matches(opts: &Opts, fields: &Fields<'_>) -> TokenStream {
} else {
ty
},
true,
);

quote! { #id => #name = Some(#ty?) }
Expand Down Expand Up @@ -253,7 +257,7 @@ fn compile_generics(Generics(types): &Generics<'_>) -> (TokenStream, TokenStream
}

#[allow(clippy::needless_pass_by_value, clippy::too_many_lines)]
fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
fn compile_data_type(opts: &Opts, ty: &Type<'_>, root: bool) -> TokenStream {
match &ty.value {
DataType::Bool => quote! { ::stef::buf::decode_bool(r) },
DataType::U8 => quote! { ::stef::buf::decode_u8(r) },
Expand All @@ -274,20 +278,20 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
BytesType::Bytes => quote! { ::stef::buf::decode_bytes_bytes(r) },
},
DataType::Vec(ty) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_vec(r, |r| { #ty }) }
}
DataType::HashMap(kv) => {
let ty_k = compile_data_type(opts, &kv.0);
let ty_v = compile_data_type(opts, &kv.1);
let ty_k = compile_data_type(opts, &kv.0, false);
let ty_v = compile_data_type(opts, &kv.1, false);
quote! { ::stef::buf::decode_hash_map(r, |r| { #ty_k }, |r| { #ty_v }) }
}
DataType::HashSet(ty) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_hash_set(r, |r| { #ty }) }
}
DataType::Option(ty) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_option(r, |r| { #ty }) }
}
DataType::NonZero(ty) => match &ty.value {
Expand All @@ -313,16 +317,16 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
}
},
DataType::Vec(ty) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_non_zero_vec(r, |r| { #ty }) }
}
DataType::HashMap(kv) => {
let ty_k = compile_data_type(opts, &kv.0);
let ty_v = compile_data_type(opts, &kv.1);
let ty_k = compile_data_type(opts, &kv.0, false);
let ty_v = compile_data_type(opts, &kv.1, false);
quote! { ::stef::buf::decode_non_zero_hash_map(r, |r| { #ty_k }, |r| { #ty_v }) }
}
DataType::HashSet(ty) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_non_zero_hash_set(r, |r| { #ty }) }
}
ty => todo!("compiler should catch invalid {ty:?} type"),
Expand All @@ -331,13 +335,17 @@ fn compile_data_type(opts: &Opts, ty: &Type<'_>) -> TokenStream {
DataType::BoxBytes => quote! { Box::<[u8]>::decode(r) },
DataType::Tuple(types) => match types.len() {
2..=12 => {
let types = types.iter().map(|ty| compile_data_type(opts, ty));
quote! { { Ok::<_, ::stef::buf::Error>((#(#types?,)*)) } }
let types = types.iter().map(|ty| compile_data_type(opts, ty, false));
let length = root.then_some(quote! { ::stef::buf::decode_u64(r)?; });
quote! { {
#length
Ok::<_, ::stef::buf::Error>((#(#types?,)*))
} }
}
n => todo!("compiler should catch invalid tuple with {n} elements"),
},
DataType::Array(ty, _size) => {
let ty = compile_data_type(opts, ty);
let ty = compile_data_type(opts, ty, false);
quote! { ::stef::buf::decode_array(r, |r| { #ty }) }
}
DataType::External(ExternalType {
Expand Down
Loading

0 comments on commit 4dd6ba3

Please sign in to comment.