Integer Compression

This library provides high performance (GB/s) compression and decompression of integers (int32/uint32/int64/uint64).

Good compression factor can be achieved when, on average, the difference between 2 consecutive values of the input remains small and thus can be encoded with fewer bits.

Common use cases:

Timestamps
Offsets
Counter based identifiers

The encoding schemes used here are based on Dr. Daniel Lemire research.

Encoding Logic

Data is encoded in blocks of multiple of 128x32bit or 256x64bits inputs in the following manner:

Difference of consecutive inputs is computed (differential coding)
ZigZag encoding is applied if a block contains at least one negative delta value
The result is bit packed into the optimal number of bits for the block

The remaining input that won't fit within a 128x32bits or 256x64bits block will be encoded in an additional block using Variable Byte encoding (with delta)

Append to compressed arrays

In stream processing systems data is usually received by chunks. Compressing and aggregating small chunks can be inneficient and impractical.

This API provides a convenient way to handle such inputs: When adding data to a compressed buffer, if the last block is a small block, encoded with Variable Byte, it will be rewritten in order to provide better compression using bit packing.

Encoding of timestamps with nanosecond resolution

Timestamps with nanosecond resolution sometimes have an actual lower internal resolution (eg. microsecond). To provide better compression for that type of data, the encoding algorithm for int64 has a specific optimization that will provide better compression factor in such case.

Usage

input := []int32{1, 2, 3}

// compress
compressed := intcomp.CompressInt32(input, nil)
// compress more data (append)
compressed = intcomp.CompressInt32([]int32{4, 5, 6}, compressed)

// uncompress
data := intcomp.UncompressInt32(compressed, nil)
// data: [1, 2, 3, 4, 5, 6]

Performance

Benchmarks for the bitpacking compression/decompression (MacBook pro M1). The result vary depending on the number of bits used to encode integers.

Compression

32bits: between 4.0 and 7.2 GB/s
64bits: between 8.0 and 14.8 GB/s

Decompression

32bits: between 3.6 and 11.5 GB/s
64bits: between 14.9 and 24.0 GB/s

TODO

Support float32/64 (using something similar to Gorilla compression)
Force creation of blocks at fixed boundaries to enable arbitrary position decoding and reversed iteration
Implement Block iterators with low memory usage
Add Binary search for sorted arrays

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
gen		gen
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compress.go		compress.go
compress_test.go		compress_test.go
delta_bitpacking.go		delta_bitpacking.go
delta_bitpacking_test.go		delta_bitpacking_test.go
delta_variablebyte.go		delta_variablebyte.go
deltapackint32.go		deltapackint32.go
deltapackint64.go		deltapackint64.go
deltapackuint32.go		deltapackuint32.go
deltapackuint64.go		deltapackuint64.go
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integer Compression

Encoding Logic

Append to compressed arrays

Encoding of timestamps with nanosecond resolution

Usage

Performance

Compression

Decompression

TODO

About

Releases 3

Packages

Contributors 3

Languages

License

ronanh/intcomp

Folders and files

Latest commit

History

Repository files navigation

Integer Compression

Encoding Logic

Append to compressed arrays

Encoding of timestamps with nanosecond resolution

Usage

Performance

Compression

Decompression

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages