Skip to content

Commit

Permalink
Merge pull request #401 from stepchowfun/rust-decoding-optimization
Browse files Browse the repository at this point in the history
Optimize the Rust deserialization logic
  • Loading branch information
stepchowfun authored Jan 12, 2022
2 parents 43888cb + f983a07 commit b05b13d
Show file tree
Hide file tree
Showing 9 changed files with 6,147 additions and 6,253 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.9.2] - 2022-01-11

### Changed
- The deserialization code generated for Rust is now significantly faster for certain types of messages.
- The serialization and deserialization functions generated for Rust now have more general type signatures.

## [0.9.1] - 2022-01-11

### Changed
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "typical"
version = "0.9.1"
version = "0.9.2"
authors = ["Stephan Boyer <stephan@stephanboyer.com>"]
edition = "2021"
description = "Algebraic data types for data interchange."
Expand Down
40 changes: 15 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,14 @@ In short, Typical offers two important features that are conventionally thought

Typical's design was informed by experience using Protocol Buffers at Google and Apache Thrift at Airbnb. This is not an officially supported product of either company. If you want to support Typical, you can do so [here](https://github.com/sponsors/stepchowfun).

## Supported programming languages
#### Supported programming languages

The following languages are currently supported:

- Rust
- TypeScript
- JavaScript (via TypeScript)

See [below](#code-generation) for remarks about each code generator.

## Tutorial

To understand what this is all about, let's walk through an example scenario. Suppose you want to build a simple API for sending emails, and you need to decide how requests and responses will be serialized over the wire. You could use a self-describing format like JSON or XML, but you may want better type safety and performance. Typical has a great story to tell about those things.
Expand Down Expand Up @@ -84,15 +82,15 @@ let message = SendEmailRequestOut {
body: "It makes serialization easy and safe.".to_owned(),
};

let mut file = BufWriter::new(File::create("/tmp/message")?);
message.serialize(&mut file)?;
let file = BufWriter::new(File::create(FILE_PATH)?);
message.serialize(file)?;
```

Another program could read the file and deserialize the message as follows:

```rust
let mut file = BufReader::new(File::open("/tmp/message")?);
let message = SendEmailRequestIn::deserialize(&mut file)?;
let file = BufReader::new(File::open(FILE_PATH)?);
let message = SendEmailRequestIn::deserialize(file)?;

println!("to: {}", message.to);
println!("subject: {}", message.subject);
Expand Down Expand Up @@ -479,23 +477,15 @@ To mitigate memory-based denial-of-service attacks, it's good practice to reject

## Code generation

The code generators have a simple user interface. They have no settings to configure, and they produce a single self-contained source file regardless of the number of schema files. The [example projects](https://github.com/stepchowfun/typical/tree/main/examples) demonstrate how to use them.

The primary goal is to generate code that is amenable to compiler optimizations, though the generated code is reasonably human-readable too. For example, indentation is used as one would expect, and variables are named appropriately. For web-based applications, it's sensible to [minify](https://en.wikipedia.org/wiki/Minification_\(programming\)) the generated code along with your other application code for distribution.

Typical types map to [plain old data types](https://en.wikipedia.org/wiki/Passive_data_structure), rather than objects with methods like getters and setters. That means serialization and deserialization aren't [zero-copy operations](https://en.wikipedia.org/wiki/Zero-copy), but it also means accessing individual fields of decoded messages is extremely fast.

The code generators are thoroughly exercised with a comprehensive [integration test suite](https://github.com/stepchowfun/typical/tree/main/integration_tests). All of the data serialized by the test suite is recorded in a file called the [omnifile](https://github.com/stepchowfun/typical/blob/main/test_data/omnifile), and every code generator is required to produce identical results, bit-for-bit.

Every programming language has its own patterns and idiosyncrasies. The sections below contain some language-specific notes.
Each code generator produces a single self-contained source file regardless of the number of schema files. The [example projects](https://github.com/stepchowfun/typical/tree/main/examples) demonstrate how to use them. The sections below contain some language-specific remarks.

### Rust

- Typical's type system maps straightforwardly to Rust's `struct`s and `enum`s, but with slightly different naming conventions. All Typical types are written in `UpperCamelCase` (e.g., `String`), whereas Rust uses a combination of that and `lower_snake_case` (e.g., `u64`). Note that Typical's integer types are called `S64` and `U64` ("S" for signed, "U" for unsigned), but the respective types in Rust are `i64` and `u64` ("i" for integer, "u" for unsigned).

### JavaScript and TypeScript

- The generated code runs in Node.js and modern web browsers. Older browsers can be targeted with tools like [Babel](https://babeljs.io/).
- The generated code runs in Node.js and modern web browsers. Older browsers can be targeted with tools like [Babel](https://babeljs.io/). For web applications, it's sensible to [minify](https://en.wikipedia.org/wiki/Minification_\(programming\)) the generated code along with your other application code.
- The generated code never uses reflection or dynamic code evaluation, so it works in [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP)-restricted environments.
- Typical's integer types map to `bigint` rather than `number`. It's safe to use integers to represent money or other quantities that shouldn't be rounded. Typical's `F64` type maps to `number`, as one would expect.
- The generated functions never throw exceptions when given well-typed arguments. The `deserialize` functions can return an `Error` to signal failure, and TypeScript requires callers to acknowledge that possibility.
Expand Down Expand Up @@ -627,17 +617,17 @@ We have coarse-grained benchmarks [here](https://github.com/stepchowfun/typical/

One benchmark serializes and deserializes a large message containing several hundred megabytes of text:

| | Rust | TypeScript |
| ----------------------------------- | ----------- | ----------- |
| **Per-thread serialization rate** | 2.1 GiB/s | 1.7 GiB/s |
| **Per-thread deserialization rate** | 906.0 MiB/s | 339.3 MiB/s |
| | Rust | TypeScript |
| ----------------------------------- | ---------- | ------------ |
| **Per-thread serialization rate** | 2.17 GiB/s | 1.96 GiB/s |
| **Per-thread deserialization rate** | 1.12 GiB/s | 366.78 MiB/s |

Another benchmark repeatedly serializes and deserializes a pathological message containing many small and deeply nested values:

| | Rust | TypeScript |
| ----------------------------------- | ------------- | ---------- |
| **Per-thread serialization rate** | 341.2.4 MiB/s | 20.2 MiB/s |
| **Per-thread deserialization rate** | 94.4 MiB/s | 1.2 MiB/s |
| | Rust | TypeScript |
| ----------------------------------- | ------------ | ----------- |
| **Per-thread serialization rate** | 336.23 MiB/s | 22.32 MiB/s |
| **Per-thread deserialization rate** | 89.81 MiB/s | 1.23 MiB/s |

These benchmarks represent two extremes. Real-world performance will be somewhere in the middle.

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/rust/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ fn benchmark<T: Serialize, U: Deserialize>(message: &T, iterations: usize) -> io

for i in 0..iterations {
let offset = message_size * i;
U::deserialize(&mut &buffer[offset..offset + message_size])?;
U::deserialize(&buffer[offset..offset + message_size])?;
}

let deserialization_duration = deserialization_instant.elapsed();
Expand Down
8 changes: 4 additions & 4 deletions examples/rust/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ fn write_to_file() -> io::Result<()> {
body: "It makes serialization easy and safe.".to_owned(),
};

let mut file = BufWriter::new(File::create(FILE_PATH)?);
message.serialize(&mut file)
let file = BufWriter::new(File::create(FILE_PATH)?);
message.serialize(file)
}

fn read_from_file() -> io::Result<()> {
let mut file = BufReader::new(File::open(FILE_PATH)?);
let message = SendEmailRequestIn::deserialize(&mut file)?;
let file = BufReader::new(File::open(FILE_PATH)?);
let message = SendEmailRequestIn::deserialize(file)?;

println!("to: {}", message.to);
println!("subject: {}", message.subject);
Expand Down
10 changes: 1 addition & 9 deletions integration_tests/rust/src/assertions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,9 @@ pub fn assert_match<T: Debug + Serialize, U: Debug + Deserialize>(
.unwrap();
file.write_all(&buffer).unwrap();

let mut slice = buffer.as_slice();
let replica = U::deserialize(&mut slice)?;
let replica = U::deserialize(buffer.as_slice())?;
println!("Message deserialized from those bytes: {:?}", replica);

if !slice.is_empty() {
return Err(Error::new(
ErrorKind::Other,
"The buffer was not consumed completely!",
));
}

if format!("{:?}", replica) != format!("{:?}", expected) {
return Err(Error::new(ErrorKind::Other, "Mismatch!"));
}
Expand Down
Loading

0 comments on commit b05b13d

Please sign in to comment.