Skip to content

mus-format/mus-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mus-go Serializer

Go Reference GoReportCard codecov

mus-go is a MUS format serializer. However, due to its minimalist design and a wide range of serialization primitives, it can also be used to implement other binary serialization formats (here is an example where mus-go is utilized to implement Protobuf encoding).

To get started quickly, go to the code generator page.

Why mus-go?

It is lightning fast, space efficient and well tested.

Brief mus-go Description

  • Has a streaming version.
  • Can run on both 32 and 64-bit systems.
  • Variable-length data types (like string, slice, or map) are encoded as: length + data. You can choose binary representation for both of these parts.
  • Supports data versioning.
  • Deserialization may fail with one of the following errors: ErrOverflow, ErrNegativeLength, ErrTooSmallByteSlice, ErrWrongFormat.
  • Can validate and skip data while unmarshalling.
  • Supports pointers.
  • Can encode data structures such as graphs or linked lists.
  • Supports oneof feature.
  • Supports private fields.
  • Supports out-of-order deserialization.
  • Supports zero allocation deserialization.

Contents

cmd-stream-go

cmd-stream-go allows you to execute commands on the server. cmd-stream-go/MUS is about 3 times faster than gRPC/Protobuf.

musgen-go

Writing mus-go code manually can be tedious and error-prone. Instead, it’s much better to use a code generator that automatically produces Marshal, Unmarshal, Size, and Skip functions for you. It’s also incredibly easy to use - simply provide a type and call Generate().

Benchmarks

Why did I create another benchmarks?
The existing benchmarks have some notable issues - try running them several times, and you’ll likely get inconsistent results, making it difficult to determine which serializer is truly faster. That was one of the reasons, and basically I made them for my own use.

How To Use

Don't forget to visit mus-examples-go.

mus-go offers several encoding options, each of which is in a separate package.

varint Package

Serializes all uint (uint64, uint32, uint16, uint8, uint), int, float, byte data types using Varint encoding. For example:

package main

import "github.com/mus-format/mus-go/varint"

func main() {
  var (
    num  = 1000
    size = varint.SizeInt(num) // The number of bytes required to serialize num.
    bs = make([]byte, size)
  )
  n := varint.MarshalInt(num, bs)        // Returns the number of used bytes.
  num, n, err := varint.UnmarshalInt(bs) // In addition to the int value and the
  // number of used bytes, it may also return an error.
  // ...
}

raw Package

Serializes the same uint, int, float, byte data types using Raw encoding. For example:

package main

import "github.com/mus-format/mus-go/raw"

func main() {
  var (
    num = 1000
    size = raw.SizeInt(num)
    bs  = make([]byte, size)
  )
  n := raw.MarshalInt(num, bs)
  num, n, err := raw.UnmarshalInt(bs)
  // ...
}

More details about Varint and Raw encodings can be found in the MUS format specification. If in doubt, use Varint.

ord (ordinary) Package

Supports the following data types: bool, string, slice, byte slice, map, and pointers.

Variable-length data types (such as string, slice, or map) are encoded as length + data. You can select the binary representation for both these parts. By default, the length is encoded using Varint without ZigZag (...PositiveInt() functions from the varint package). In this case the maximum length is limited by the maximum value of the int type on your system. This is ok for different architectures - an attempt to unmarshal, for example, too long string on a 32-bit system, will result in ErrOverflow.

Let's examine how the slice type is serialized:

package main

import (
  "github.com/mus-format/mus-go"
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
)

func main() {
  var (
    sl = []int{1, 2, 3, 4, 5}
    
    lenM mus.Marshaller // Length marshaller, if nil varint.MarshalPositiveInt() is used.
    lenU mus.Unmarshaller // Length unmarshaller, if nil varint.UnmarshalPositiveInt() is used.

    m mus.MarshallerFn[int] = varint.MarshalInt // Implementation of the 
    // mus.Marshaller interface for slice elements.
    u mus.UnmarshallerFn[int] = varint.UnmarshalInt // Implementation of the
    // mus.Unmarshaller interface for slice elements.
    s mus.SizerFn[int] = varint.SizeInt // Implementation of the mus.Sizer
    // interface for slice elements.

    size = ord.SizeSlice[int](sl, s)
    bs   = make([]byte, size)
  )
  n := ord.MarshalSlice[int](sl, lenM, m, bs)
  sl, n, err := ord.UnmarshalSlice[int](lenU, u, bs)
  // ...
}

Maps are serialized in the same way.

unsafe Package

unsafe package provides maximum performance, but be careful it uses an unsafe type conversion.

This warning largely applies to the string type - modifying the byte slice after unmarshalling will also change the string’s contents. Here is an example that demonstrates this behavior more clearly.

Supports the following data types: bool, string, byte slice, byte, and all uint, int, float.

pm (pointer mapping) Package

Let's consider the following struct:

type TwoPtr struct {
  ptr1 *string
  ptr2 *string
}

With the ord package after unmarshal twoPtr.ptr1 != twoPtr.ptr2, and with the pm package, they will be equal. This feature allows to serialize data structures such as graphs or linked lists. Corresponding examples can be found at mus-examples-go.

Structs Support

In fact, mus-go does not support structural data types, which means that you will have to implement the mus.Marshaller, mus.Unmarshaller, mus.Sizer and mus.Skipper interfaces yourself. But it's not difficult at all, for example:

package main

import (
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
)
  
type Foo struct {
  a int
  b bool
  c string
}

// MarshalFoo implements the mus.Marshaller interface.
func MarshalFoo(v Foo, bs []byte) (n int) {
  n = varint.MarshalInt(v.a, bs)
  n += ord.MarshalBool(v.b, bs[n:])
  return n + ord.MarshalString(v.c, nil, bs[n:])
}

// UnmarshalFoo implements the mus.Unmarshaller interface.
func UnmarshalFoo(bs []byte) (v Foo, n int, err error) {
  v.a, n, err = varint.UnmarshalInt(bs)
  if err != nil {
    return
  }
  var n1 int
  v.b, n1, err = ord.UnmarshalBool(bs[n:])
  n += n1
  if err != nil {
    return
  }
  v.c, n1, err = ord.UnmarshalString(nil, bs[n:])
  n += n1
  return
}

// SizeFoo implements the mus.Sizer interface.
func SizeFoo(v Foo) (size int) {
  size += varint.SizeInt(v.a)
  size += ord.SizeBool(v.b)
  return size + ord.SizeString(v.c, nil)
}

// SkipFoo implements the mus.Skipper interface.
func SkipFoo(bs []byte) (n int, err error) {
  n, err = varint.SkipInt(bs)
  if err != nil {
    return
  }
  var n1 int
  n1, err = ord.SkipBool(bs[n:])
  n += n1
  if err != nil {
    return
  }
  n1, err = ord.SkipString(nil, bs[n:])
  n += n1
  return
}

All you have to do is deconstruct the structure into simpler data types and choose the desired encoding for each. Of course, this requires some effort. But, firstly, this code can be generated, secondly, this approach provides more flexibility, and thirdly, mus-go remains quite simple, which makes it easy to implement for other programming languages.

Arrays Support

Unfortunately, Golang does not support generic parameterization of array sizes. Therefore, to serialize an array - make a slice of it and use the ord package. Or, for better performance, implement the necessary Marshal, Unmarshal, ... functions, as done in the ord/slice.go file.

MarshallerMUS Interface

It is often convenient to define the MarshallerMUS interface:

type MarshallerMUS interface {
  MarshalMUS(bs []byte) (n int)
  SizeMUS() (size int)
}

// Foo implements the MarshallerMUS interface.
type Foo struct {...}

func (f Foo) MarshalMUS(bs []byte) (n int) {
  return MarshalFooMUS(f, bs) // or FooDTS.Marshal(f, bs)
}

func (f Foo) SizeMUS() (size int) {
  return SizeFooMUS(f) // or FooDTS.Size(f)
}
...

Generic MarshalMUS Function

To define generic MarshalMUS function:

package main 

// Define MarshallerMUS interface ...
type MarshallerMUS interface {
  MarshalMUS(bs []byte) (n int)
  SizeMUS() (size int)
}

// ... and the function itself.
func MarshalMUS(v MarshallerMUS) (bs []byte) {
	bs = make([]byte, v.SizeMUS())
	v.MarshalMUS(bs)
	return
}

// Define a structure that implements the MarshallerMUS interface.
type Foo struct {...}
...

func main() {
  // Now the generic MarshalMUS function can be used like this.
  bs := MarshalMUS(Foo{...})
  // ...
}

The full code can be found here.

DTM (Data Type Metadata) Support

mus-dts-go provides DTM support.

Data Versioning

mus-dts-go can be used to implement data versioning. Here is an example.

Marshal/Unmarshal Interfaces (or oneof feature)

You should read the mus-dts-go documentation first.

A simple example:

// Interface to Marshal/Unmarshal.
type Instruction interface {...}

// Copy implements the Instruction and MarshallerMUS interfaces.
type Copy struct {...}

// Insert implements the Instruction and MarshallerMUS interfaces.
type Insert struct {...}

var (
  CopyDTS = ...
  InsertDTS = ...
)

func MarshalInstruction(instr Instruction, bs []byte) (n int) {
  if m, ok := instr.(MarshallerMUS); ok {
    return m.MarshalMUS(bs)
  }
  panic("instr doesn't implement the MarshallerMUS interface")
}

func UnmarshalInstruction(bs []byte) (instr Instruction, n int, err error) {
  dtm, n, err := dts.UnmarshalDTM(bs)
  if err != nil {
    return
  }
  switch dtm {
  case CopyDTM:
    return CopyDTS.UnmarshalData(bs[n:])
  case InsertDTM:
    return InsertDTS.UnmarshalData(bs[n:])
  default:
    err = ErrUnexpectedDTM
    return
  }
}

func SizeInstruction(instr Instruction) (size int) {
  if s, ok := instr.(MarshallerMUS); ok {
    return s.SizeMUS()
  }
  panic("instr doesn't implement the MarshallerMUS interface")
}

A full example can be found at mus-examples-go. Take a note, nothing will stop us to Marshal/Unmarshal, for example, a slice of interfaces.

Validation

Validation is performed during deserialization.

String

With the ord.UnmarshalValidString() function, you can validate the length of a string.

package main

import (
  "errors"

  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go/ord"
)

func main() {
  var (
    ErrTooLongString                         = errors.New("too long string")
    lenU mus.Unmarshaller // Length unmarshaller, if nil varint.UnmarshalPositiveInt() is used.
    lenVl        com.ValidatorFn[int] = func(length int) (err error) {  // Length validator.
      if length > 10 {
        err = ErrTooLongString
      }
      return
    }
    skip = true // If true and the encoded string does not meet the requirements
    // of the validator, all bytes belonging to it will be skipped - n will be 
    // equal to SizeString(str).
  )
  // ...
  str, n, err := ord.UnmarshalValidString(lenU, lenVl, skip, bs)
  // ...
}

Slice

With the ord.UnmarshalValidSlice() function, you can validate the length and elements of a slice. Also it provides an option to skip the rest of the data if one of the validators returns an error.

package main

import (
  "errors"

  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go"
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
)

func main() {
  var (
    ErrTooLongSlice    = errors.New("too long slice")
    ErrTooBigSliceElem = errors.New("too big slice elem")

    lenU mus.Unmarshaller // Length unmarshaller, if nil varint.UnmarshalPositiveInt() is used.
    lenVl com.ValidatorFn[int] = func(length int) (err error) { // Length validator.
      if length > 5 {
        err = ErrTooLongSlice
      }
      return
    }
    u                  = mus.UnmarshallerFn[int](varint.UnmarshalInt)
    vl com.ValidatorFn[int] = func(e int) (err error) { // Elements validator.
      if e > 10 {
        err = ErrTooBigSliceElem
      }
      return
    }
    sk                 = mus.SkipperFn(varint.SkipInt) // If nil, a validation 
    // error will be returned immediately. If != nil and one of the validators 
    // returns an error, will be used to skip the rest of the slice.
  )
  // ...
  sl, n, err := ord.UnmarshalValidSlice[int](lenU, lenVl, u, vl, sk, bs)
  // ...
}

Map

Validation works in the same way as for the slice type.

Struct

Unmarshalling an invalid structure can stop at the first invalid field with a validation error.

package main

import (
  "errors"

  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
)

func UnmarshalValidFoo(vl com.Validator[int], bs []byte) (v Foo, n int, err error) {
  // Unmarshal the first field.
  v.a, n, err = varint.UnmarshalInt(bs)
  if err != nil {
    return
  }
  // Validate the first field. There is no need to deserialize the entire 
  // structure to find out that it is invalid.
  if err = vl.Validate(v.a); err != nil {
    err = fmt.Errorf("incorrect field 'a': %w", err)
    return // The rest of the structure remains unmarshaled.
  }
  // ...
}

// vl can be used to check Foo.a field.
var vl com.ValidatorFn[int] = func(n int) (err error) {
  if n > 10 {
    return errors.New("bigger than 10")
  }
  return
}

Out of Order Deserialization

A simple example:

package main

import (
  "fmt"

  "github.com/mus-format/mus-go/varint"
)

func main() {
  // Encode three numbers in turn - 5, 10, 15.
  bs := make([]byte, varint.SizeInt(5)+varint.SizeInt(10)+varint.SizeInt(15))
  n1 := varint.MarshalInt(5, bs)
  n2 := varint.MarshalInt(10, bs[n1:])
  varint.MarshalInt(15, bs[n1+n2:])

  // Get them back in the opposite direction. Errors are omitted for simplicity.
  n1, _ = varint.SkipInt(bs)
  n2, _ = varint.SkipInt(bs)
  num, _, _ := varint.UnmarshalInt(bs[n1+n2:])
  fmt.Println(num)
  num, _, _ = varint.UnmarshalInt(bs[n1:])
  fmt.Println(num)
  num, _, _ = varint.UnmarshalInt(bs)
  fmt.Println(num)
  // The output will be:
  // 15
  // 10
  // 5
}

Zero Allocation Deserialization

Can be achieved using the unsafe package.