-
Notifications
You must be signed in to change notification settings - Fork 12
The MessagePack Type System
MessagePack is a compact, schema-less binary protocol. You can find its formal specification here.
MessagePack, like JSON, supports two composite types: maps and arrays.
A map of length N is defined as having N pairs of objects of any type, where the first object in each pair is the key, and the second object is the value. Maps are marked by a "header" that denotes the size of map, and takes up between 1 and 5 bytes on the wire. (Although maps with non-str
keys are a perfectly legal construct in MessagePack, many implementations of the encoding do not support them, as the language-specific behavior of non-string hash maps may be undefined.)
An array of length N is defined as N sequential objects (of any type). Arrays are prefixed by a header that denotes their size, and takes up between 1 and 5 bytes on the wire.
MessagePack supports the following base types:
A MessagePack int
is a signed integer between -(1<<63)
and (1<<63)-1
. An int
takes up 1 to 9 bytes on the wire.
A MessagePack uint
is an unsigned integer from 0 to (1<<64)-1
Like an int
, a uint
takes up 1 to 9 bytes on the wire.
A MessagePack bool is a simple boolean. It always takes up exactly 1 byte.
Null is analogous to JSON's null
; it is meant to signify the absence of an object. It always takes up exactly 1 byte.
Float is a floating point number, encoded as a 32- or 64-bit IEEE 754 float. 32-bit floats take up 5 bytes, and 64-bit floats take up 9 bytes.
MessagePack bin
is between 0 and (1<<32)-1
bytes of arbitrary data. The encoded object requires between 2 and 5 extra bytes beyond the size of the binary.
MessagePack str
is a UTF-8-encoded string between 0 and (1<<32)-1
bytes long. The encoded object
requires between 1 and 5 extra bytes beyond the size of the string.
A MessagePack ext
object is a tuple of a signed, 8-bit integer and arbitrary binary data up to (1<<32)-1
bytes. Users can use the ext
type to extend the MessagePack type system. An ext
takes up between 2 and 6 extra bytes beyond the size of the data.
Succinctly, JSON's type system is a subset of the MessagePack type system.
The two most important differences between the encodings, beyond the fact that one is binary and one is plaintext, is that JSON maps must always be keyed by strings, and that JSON has no notion of floats, integers, or unsigned integers; those are all simply a "number" of arbitrary precision.
Consequently, it is possible to translate arbitrary valid MessagePack to arbitrary valid JSON deterministically, provided the following can be guaranteed:
- All MessagePack
map
types are keyed by string-able fields (e.g.str
or safebin
) - All MessagePack
ext
types have a valid, application-defined JSON representation The rest of the translation process is fairly straightforward;str
fields need to be properly escaped, andbin
fields should be converted to quoted base-64 strings.
It is also possible to convert JSON to MessagePack; however, there are multiple possible valid MessagePack objects that can represent the same JSON object. (Consider that in {"number":1}
, the "number" field could be encoded as a float32
, float64
, uint
, or int
in MessagePack.) Consequently, translation in this direction requires that the translator have some a-priori knowledge about the objects being decoded in order to produce deterministic results.