Skip to content

Commit

Permalink
doc(book): expand the introduction page
Browse files Browse the repository at this point in the history
  • Loading branch information
dnaka91 committed Jan 3, 2024
1 parent c43d077 commit 6f3c902
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 10 deletions.
2 changes: 1 addition & 1 deletion book/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import maboGrammar from "../../vscode-extension/syntaxes/mabo.tmLanguage.json";

// https://vitepress.dev/reference/site-config
export default defineConfig({
title: "mabo",
title: "Mabo",
description: "Strongly Typed Encoding Format",
appearance: "dark",
lastUpdated: true,
Expand Down
2 changes: 1 addition & 1 deletion book/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ layout: home
hero:
name: "Mabo"
text: "Strongly Typed Encoding Format"
tagline: Data format and schema, with a type system as strong as Rust's
tagline: Schema-based binary data format, with a type system as strong as Rust's
image:
src: /logo.svg
alt: Mabo
Expand Down
36 changes: 28 additions & 8 deletions book/src/introduction.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,41 @@
# Strongly Typed Encoding Format
# Strongly typed encoding format

Data format and schema, with a type system as strong as Rust's.
Schema-based binary data format, with a type system as strong as Rust's.

Mabo is a data encoding format, that borrows a lot from existing formats like [Protobuf](https://protobuf.dev), [Cap'n Proto](https://capnproto.org) and [Flatbuffers](https://flatbuffers.dev), but in contrast vastly extends the type system to make it as strong as Rust's.
Mabo is a data encoding format, that borrows a lot from existing schema-based formats like [Protobuf](https://protobuf.dev), [Cap'n Proto](https://capnproto.org) and [Flatbuffers](https://flatbuffers.dev). It takes some ideas from the binary data formats [Postcard](https://github.com/jamesmunns/postcard) and [Bincode](https://github.com/bincode-org/bincode) as well, which are popular choices in the Rust ecosystem.

In contrast to those projects it favors a very strong and strict type system, and takes different overall design decisions. It is not necessarily _better_ than them, but takes other approaches to the challenges when designing a data format.

## What means schema-based?

Data formats can generally be categorized into 3 groups: Self-describing, non-self-describing, and schema-based.

Self-describing formats are most common in text-based formats, meaning they're human readable. The self-describing part is usually implemented by either pairing each value with a key to name it, or writing a descriptor for the format at the beginning of a payload. Popular formats include _JSON_, _XML_, _CSV_ and many more.

Non-self-describing formats are most common in binary formats that are machine readable. They are often accompany special handling of the payload to squeeze out extra saving in storage size and the structure is often defined by the program itself. Some popular formats with a defined structure are _MessagePack_ or _CBOR_, but again, there are many more.

Lastly schema-based formats are mostly binary as well, but their structure is described in a text-based schema. This schema is then used to generate the source code that defines data structures and the logic to encode and decode it into and from the format.

Within all these groups there are exceptions as well. There are text-based formats that are not self-describing and there are binary formats that are self-describing or offer to encode as either self-describing or not.

Mabo falls into the last group.

## Why a stronger type system?

Firs and foremost, I personally really enjoy the Rust programming language and its strict but flexible type system.

In the many years that I have used Protobufs, they always have disappointed with the few supported types. Most of the time that resulted in an additional round of validation after decoding the format, to ensure all data is in a valid form.
In the many years that I have used Protobufs, they always have felt limited with the few supported built-in types. Most of the time that resulted in an additional round of validation after decoding the format, to ensure all data is in a valid form.

Many of these validations could be avoided by a stronger type system, which would rule out many wrong states by refusing them upfront.

By extending the type system, data structures can be defined in a way, that ensures the data is already in a valid state after decoding.
For example consider a the type for a TCP port. It can range from `0` to `65535` (inclusive), making a 16-bit unsigned integer the perfect fit for the value range. Many formats don't provide a data type for this integer, despite many programming language having one (`u16`, `uint16_t`, `uint16`, `Short`, ...).

By extending the type system, data structures can be defined in a way, that ensures the data is already in a valid state after decoding, not needing an additional step for validation.

## The project name

## Why not use an existing data format?
The name Mabo is Japanese (_マーボー_) and a short form of the chinese dish [麻婆豆腐](https://en.wikipedia.org/wiki/Mapo_tofu) ([_マーボーどうふ_](https://jisho.org/word/%E9%BA%BB%E5%A9%86%E8%B1%86%E8%85%90), [_/maːboːdoɯɸɯ/_](https://en.wikipedia.org/wiki/Help:IPA/Japanese), _mabodofu_), which was one of my favorite foods while living in Japan.

### Protobuf
After long search for a project name I gave up with acronyms and decided to follow naming after food, as done with [Bun](https://bun.sh) and [OpenTofu](https://opentofu.org). The Bun logo resembles a [肉まん](https://en.wikipedia.org/wiki/Baozi) ([_にくまん_](https://jisho.org/word/%E8%82%89%E9%A5%85), [_/nikɯmaɴ/_](https://en.wikipedia.org/wiki/Help:IPA/Japanese), _nikuman_) with is face on it, which is a soft bread filled with pork, and the OpenTofu logo resembles a block of tofu with indications of eyes.

### Cap'n Proto
Therefore, I decided to name the project after one of my favorite dishes.

0 comments on commit 6f3c902

Please sign in to comment.