A simple variable-length encoding for unsigned 64 bit integers.
To decode a VarU64, look at the first byte. If its value is below 248, the value itself is the encoded number. Else, the first byte determines the further length
of the encoding:
first byte | number of additional bytes |
---|---|
248 | 1 |
249 | 2 |
250 | 3 |
251 | 4 |
252 | 5 |
253 | 6 |
254 | 7 |
255 | 8 |
Following the first byte are length
many bytes. These bytes are the big-endian representation of the encoded number.
This specification allows different ways to encode the same number, e.g. 0x00
and 0xf800
both decode to zero. To obtain a canonical encoding, we limit those possibilities: Of all possible representations for a number that this scheme admits, the shortest one is its unique, canonical encoding. Canonical decoders must indicate an error if a value uses an encoding that is longer than necessary.
Analogously to this specification, you can encode signed integers by using big-endian two's complement representation.
The specification can also be adapted to other sizes of integers (up to 2040 bit inclusive), by starting the length table at a different value. For example, the VarI32 encoding for signed 32-bit integers is defined as follows:
To decode a VarI32, look at the first byte. If its (unsigned) value is below 252, the value itself is the encoded number in two's complement. Else, the first byte determines the further length
of the encoding:
first byte (unsigned) | number of additional bytes |
---|---|
252 | 1 |
253 | 2 |
254 | 3 |
255 | 4 |
Following the first byte are length
many bytes. These bytes are the big-endian, two's complement representation of the encoded number.
The VarNonZeroU64 encoding for nonzero 64-bit integers consists of subtracting one from the integer and encoding the resulting number as a regular VarU64. Unsigned integers of other size work analogously.
Non-zero unsigned integers can be generalized to unsigned integers strictly greater than some number X
: The VarGtXU64 encoding for 64-bit integer strictly great then X
consists of subtracting X
from the integer and encoding the resulting number as a regular VarU64. Unsigned integers of other size work analogously.
This has been inspired by the issues in the multiformats varint repository, in particular issues #8 and #12.