Skip to content
Marc edited this page Jul 2, 2016 · 11 revisions

The Cereal format

The serialization format used by Cereal is really simple: All the data types are stored in Big Endian as raw data (except for std::string) with a byte in front that tells us what data type it is.

std::string are stored a bit different: first, a short (2 bytes) is stored in front of it that tells us how many characters does the string have. And then, all the characters are stored as single-byte ASCII characters, one after the other. So Database would be encoded as 0x00 0x08 0x44 0x61 0x74 0x61 0x62 0x61 0x73 0x65 (8 characters, D, a, t, a, b, a, s, e).

Once we know that we can move into our serialization units:

Fields

Fields have a byte that identifies them as a field (value 0x09). Next, we can find a short indicating the length of the name, followed by the ASCII name.
Finally, we find another byte indicating the data type and the data (from 1 to 8 bytes long, depending on the data type).

Arrays

Arrays also have a byte that identifies them as an array (value 0x0A) because of compatibility with fields and objects. After that, we find a short that indicates the length of the name, and an array of bytes with the name.
As usual, now we can find the byte indicating the data type of the array followed by four bytes indicating the item count of the array. After that we can see all the raw bytes of the array.

Use sizeof(data type) * count to figure out the amount of bytes an array uses.

Objects

Coming soon