uint<N>
:N
-bit unsigned integer whereN % 8 == 0
and8 <= N <= 256
scalar<N>
: An alternate encoding ofuint<N>
for more compact serializationbit
: 1-bit unsigned integer
container
: fixed length heterogenous collection of values- Denoted with curly braces
{type1, type2, ..., typeN}
- e.g.
{uint64, bool}
- Denoted with curly braces
tuple
: fixed length homogenous collection of values- Denoted with brackets
[N]
whereN
denotes collection size - e.g.
uint64[4]
for a length-4 collection ofuint64
values.
- Denoted with brackets
array
: dynamic length homogenous collection of values- Denoted with brackets
[]
- e.g.
uint64[]
for a dynamic length collection ofuint64
values
- Denoted with brackets
We define the following aliases which MUST be supported.
byte -> uint8
bytes -> byte[]
bytesN -> byte[N]
bool -> bit
<type>?
: Any type can be marked as optional
Serialization is defined recursively by the function serialize
which consumes
a value
of a specified type and produces a binary byte string.
The following conventions are assumed:
- The variable
N
is used to denote length and is constrained0 <= N <= 2**32 - 1
- The variable
bit_size
is used to denote the bitsize of either auint
orscalar
and is constrained by the rules:bit_size % 8 == 0
: e.g. a multiple of 80 <= bit_size <= 256
: e.g. minimum of8
bits and maximum of256
bits.
- The hexary notation
0x00
is used to represent individual byte values. - The notation
x0, x1, ..., xN
is use to denote a fixed-length collection of lengthN
- The notation
x0, x1, ...
is used to denote a dynamic length collection. - The operator
+
is used to denote concatenation of two binary byte strings.
Additionally, the python code examples use these conventions for variable names:
- The variable
value
is used to denote a singular value - The variable
values
is used to denote collection of homogenous values - The variabee
item
is used to denote a single item from a homogenous collection - The variable
elements
is used to denote collection of heterogenous values - The variable
element
is used to denote a single element from a heterogenous collection
All examples assume the value being serialized has undergone validation
Lastly, may of the python examples make use of a magic function referred to
as magic_serialize
. The exact implementation of this function is out of
scope for this spec, but it is defined as returning the proper serialized
representation of the given value.
A real implementation would require type information to be supplied for individual values since it is not possible to determine the proper serialization for an arbitrarily supplied value.
0x01
if the bit is set.0x00
if the bit is not set.
TODO: Consider specialized
bitfield
alias which can natively be deserialized into a sequence of bits (rather than bytes)
def serialize_bit(value: bool) -> bytes:
if value:
return b'\x01'
else:
return b'\x00'
The integer is converted to a sequence of bytes in little endian byte order.
Assumes prior validation that
bit_size
is a proper bit-size andvalue
fits in the range.
def serialize_uintN(bit_size: int, value: int) -> bytes:
return value.to_bytes(bit_size, "little")
The integer is encoded using unsigned LEB128 encoding with the following constraints.
- The serialized value must be encoded using the smallest number of bytes. Empty trailing bytes are not allowed.
def serialize_scalar(bit_size: int, value: int) -> bytes:
LOW_MASK = 0b01111111 # lowermost 7 bits set
HIGH_MASK = 0b10000000 # highest bit set
result = bytearray() # a mutable container for accumulating the serialized bytes
while True:
# pull the lowermost 7-bits off the value
byte = value & LOW_MASK
# shift the lowermost 7-bits off of the value
value >>= 7
if value:
# set the high bit in since we are not finished serializing
byte |= HIGH_MASK
result.append(byte)
if not value:
# if we have fully serialized the value, break from the loop
break
return bytes(result)
Container types are serialized as the concatenation of each of their serialized elements.
Note that containers can be nested, making this a recursive operation.
- Let
E
be the container value comprised of the individual elements[e0, e1, ..., eN]
- Let
S
be the sequence[serializer0, serializer1, ..., serializerN]
denoting the individual serialization functions for the elements fromE
The result is the concatenation of the individual serialized elements computed by applying the serialization functions from S
to the element values from E
: e.g. serializer0(e0) + serializer1(e1) + ... + serializerN(eN)
def serialize_container(value):
return b''.join((
magic_serialize(element)
for element in value
))
Tuples types are serialized as the concatenation of each of the serialized items.
- Let
V
be the tuple value containing the individual elements[v0, v1, ..., vN]
- Let
serialize_type
be the serialization function for the value type ofV
The result is the concatenation of applying serialize_type
to each value: e.g. serialize_type(v0) + serialize_type(v1) + ... + serialize_type(vN)
def serialize_container(values):
return b''.join((
magic_serialize(item)
for item in values
))
Arrays are serialized as the length prefixed concatenation of the individual items.
- Let
V
be the array value containing the individual elements[v0, v1, ...]
- Let
L
be the number of elements inV
- let
length_prefix
be the result of serializingL
as ascalar32
- Let
serialized_values
be the result of serializingV
as an equivalent Tuple type of lengthL
The result is the concatenation of length_prefix
and serialized_values
: e.g. length_prefix + serialized_values