Skip to content

Latest commit

 

History

History
327 lines (247 loc) · 11.8 KB

README.md

File metadata and controls

327 lines (247 loc) · 11.8 KB

Native model

Crates.io Build Test Release Documentation License

Add interoperability on the top of serialization formats like bincode, postcard etc.

See concepts for more details.

Goals

  • Interoperability: Allows different applications to work together, even if they are using different versions of the data model.
  • Data Consistency: Ensure that we process the data expected model.
  • Flexibility: You can use any serialization format you want. More details here.
  • Performance: A minimal overhead (encode: ~20 ns, decode: ~40 ps). More details here.

Usage

       Application 1 (DotV1)        Application 2 (DotV1 and DotV2)
                |                                  |
   Encode DotV1 |--------------------------------> | Decode DotV1 to DotV2
                |                                  | Modify DotV2
   Decode DotV1 | <--------------------------------| Encode DotV2 back to DotV1
                |                                  |
use native_model::native_model;
use serde::{Deserialize, Serialize};

#[derive(Deserialize, Serialize, PartialEq, Debug)]
#[native_model(id = 1, version = 1)]
struct DotV1(u32, u32);

#[derive(Deserialize, Serialize, PartialEq, Debug)]
#[native_model(id = 1, version = 2, from = DotV1)]
struct DotV2 {
    name: String,
    x: u64,
    y: u64,
}

impl From<DotV1> for DotV2 {
    fn from(dot: DotV1) -> Self {
        DotV2 {
            name: "".to_string(),
            x: dot.0 as u64,
            y: dot.1 as u64,
        }
    }
}

impl From<DotV2> for DotV1 {
    fn from(dot: DotV2) -> Self {
        DotV1(dot.x as u32, dot.y as u32)
    }
}

// Application 1
let dot = DotV1(1, 2);
let bytes = native_model::encode(&dot).unwrap();

// Application 1 sends bytes to Application 2.

// Application 2
// We are able to decode the bytes directly into a new type DotV2 (upgrade).
let (mut dot, source_version) = native_model::decode::<DotV2>(bytes).unwrap();
assert_eq!(dot, DotV2 {
    name: "".to_string(),
    x: 1,
    y: 2
});
dot.name = "Dot".to_string();
dot.x = 5;
// For interoperability, we encode the data with the version compatible with Application 1 (downgrade).
let bytes = native_model::encode_downgrade(dot, source_version).unwrap();

// Application 2 sends bytes to Application 1.

// Application 1
let (dot, _) = native_model::decode::<DotV1>(bytes).unwrap();
assert_eq!(dot, DotV1(5, 2));
  • Full example here.

Serialization format

You can use default serialization formats via the feature flags, like:

[dependencies]
native_model = { version = "0.1", features = ["bincode_2_rc"] }

Each feature flag corresponds to a specific minor version of the serialization format. In order to avoid breaking changes, the default serialization format is the oldest one.

Custom serialization format

Define a struct with the name you want. This struct must implement native_model::Encode and native_model::Decode traits.

Full examples:

Others examples, see the default implementations:

Notice

native_model provides implementations that rely on metadata-less formats and serde. There are known issues with some serde advanced features such as:

  • #[serde(flatten)]
  • #[serde(skip)]
  • #[serde(skip_deserializing)]
  • #[serde(skip_serializing)]
  • #[serde(skip_serializing_if = "path")]
  • #[serde(tag = "...")]
  • #[serde(untagged)]

Or types implementing similar strategies such as serde_json::Value.

The rmp-serde serialization format can optionally support them serializing structs as maps, the RmpSerdeNamed struct is provided to support this use-case.

Data model

Define your model using the macro native_model.

Attributes:

  • id = u32: The unique identifier of the model.
  • version = u32: The version of the model.
  • with = type: The serialization format that you use for the Encode/Decode implementation. Setup here.
  • from = type: Optional, the previous version of the model.
    • type: The previous version of the model that you use for the From implementation.
  • try_from = (type, error): Optional, the previous version of the model with error handling.
    • type: The previous version of the model that you use for the TryFrom implementation.
    • error: The error type that you use for the TryFrom implementation.
use native_model::native_model;
use serde::{Deserialize, Serialize};

#[derive(Deserialize, Serialize, PartialEq, Debug)]
#[native_model(id = 1, version = 1)]
struct DotV1(u32, u32);

#[derive(Deserialize, Serialize, PartialEq, Debug)]
#[native_model(id = 1, version = 2, from = DotV1)]
struct DotV2 {
    name: String,
    x: u64,
    y: u64,
}

// Implement the conversion between versions From<DotV1> for DotV2 and From<DotV2> for DotV1.

impl From<DotV1> for DotV2 {
    fn from(dot: DotV1) -> Self {
        DotV2 {
            name: "".to_string(),
            x: dot.0 as u64,
            y: dot.1 as u64,
        }
    }
}

impl From<DotV2> for DotV1 {
    fn from(dot: DotV2) -> Self {
        DotV1(dot.x as u32, dot.y as u32)
    }
}

#[derive(Deserialize, Serialize, PartialEq, Debug)]
#[native_model(id = 1, version = 3, try_from = (DotV2, anyhow::Error))]
struct DotV3 {
    name: String,
    cord: Cord,
}

#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct Cord {
    x: u64,
    y: u64,
}

// Implement the conversion between versions From<DotV2> for DotV3 and From<DotV3> for DotV2.

impl TryFrom<DotV2> for DotV3 {
    type Error = anyhow::Error;

    fn try_from(dot: DotV2) -> Result<Self, Self::Error> {
        Ok(DotV3 {
            name: dot.name,
            cord: Cord { x: dot.x, y: dot.y },
        })
    }
}

impl TryFrom<DotV3> for DotV2 {
    type Error = anyhow::Error;

    fn try_from(dot: DotV3) -> Result<Self, Self::Error> {
        Ok(DotV2 {
            name: dot.name,
            x: dot.cord.x,
            y: dot.cord.y,
        })
    }
}

Codecs

native_model comes with several optional built-in serializer features available:

  • bincode 1.3

    • This is the default codec.
    • Warning: This codec may not work with all serde-derived types.
  • bincode 2.0.0-rc.3

    • Enable the bincode_2_rc feature and use the native_model::bincode_2_rc::Bincode attribute to have native_db use this crate for serializing & deserializing.
    • Warning: This codec may not work with all serde-derived types.
  • postcard 1.0

    • Enable the postcard_1_0 feature and use the native_model::postcard_1_0::PostCard attribute.
    • Warning: This codec may not work with all serde-derived types.
  • rmp-serde 1.3

    • Enable the rmp_serde_1_3 feature and use the native_model::rmp_serde_1_3::RmpSerde attribute.
Codec example:

As example, to use rmp-serde:

  1. In your project's Cargo.toml file, enable the rmp_serde_1_3 feature for the native_model dependency.
    • Be sure to check crates.io for the most recent native_model version number.
[dependencies]
serde = { version = "1.0", features = [ "derive" ] }
native_model = { version = "0.4", features = [ "rmp_serde_1_3" ] }
  1. Assign the rmp_serde_1_3 codec to your struct using the with attribute:
use native_model::native_model;

#[derive(Clone, Default, serde::Deserialize, serde::Serialize)]
#[native_model(id = 1, version = 1, with = native_model::rmp_serde_1_3::RmpSerde)]
struct MyStruct {
	my_string: String,
	// etc.
}
Additional reading

You may also want to check out David Koloski's Rust serialization benchmarks for help selecting the codec (i.e. bincode_1_3, rmp_serde_1_3, etc.) that's best for your project.

Status

Early development. Not ready for production.

Concepts

In order to understand how the native model works, you need to understand the following concepts.

  • Identity(id): The identity is the unique identifier of the model. It is used to identify the model and prevent to decode a model into the wrong Rust type.
  • Version(version) The version is the version of the model. It is used to check the compatibility between two models.
  • Encode: The encode is the process of converting a model into a byte array.
  • Decode: The decode is the process of converting a byte array into a model.
  • Downgrade: The downgrade is the process of converting a model into a previous version of the model.
  • Upgrade: The upgrade is the process of converting a model into a newer version of the model.

Under the hood, the native model is a thin wrapper around serialized data. The id and the version are twice encoded with a little_endian::U32. That represents 8 bytes, that are added at the beginning of the data.

+------------------+------------------+------------------------------------+
|     ID (4 bytes) | Version (4 bytes)| Data (indeterminate-length bytes)  |
+------------------+------------------+------------------------------------+

Full example here.

Performance

Native model has been designed to have a minimal and constant overhead. That means that the overhead is the same whatever the size of the data. Under the hood we use the zerocopy crate to avoid unnecessary copies.

👉 To know the total time of the encode/decode, you need to add the time of your serialization format.

Resume:

  • Encode: ~20 ns
  • Decode: ~40 ps
data size encode time (ns) decode time (ps)
1 B 19.769 ns - 20.154 ns 40.526 ps - 40.617 ps
1 KiB 19.597 ns - 19.971 ns 40.534 ps - 40.633 ps
1 MiB 19.662 ns - 19.910 ns 40.508 ps - 40.632 ps
10 MiB 19.591 ns - 19.980 ns 40.504 ps - 40.605 ps
100 MiB 19.669 ns - 19.867 ns 40.520 ps - 40.644 ps

Benchmark of the native model overhead here.