lz4lite
provides access to the extremely fast compression in
lz4 for performing in-memory compression.
As of v0.2.0, lz4lite
can now serialize and compress any R object
understood by base::serialize()
.
If the input is known to be an atomic, numeric vector, and you do not
care about any attributes or names on this vector, then
lz4_compress()
/lz4_uncompress()
can be used. These are bespoke
serialization routines for atomic numeric vectors that run faster since
they avoid R’s internals.
For a more general solution to fast serialization of R objects, see the fst or qs packages.
Currently lz4 code provided with this package is v1.9.3.
- For arbitrary R objects
lz4_serialize
/lz4_unserialize
serialize and compress any R object.
- For atomic vectors with numeric values
lz4_compress()
/lz4_uncompress()
- compress the data within a vector of raw, integer, real, complex or logical values
- faster than
lz4_serialize/unserialize
but throws away all attributes i.e. names, dims etc
You can install from GitHub with:
# install.package('remotes')
remotes::install_github('coolbutuseless/lz4lite)
dat <- mtcars
buf <- lz4_serialize(dat)
length(buf) # Number of bytes
#> [1] 1862
# compression ratio
length(buf)/length(serialize(dat, NULL))
#> [1] 0.489099
head(lz4_unserialize(buf))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(lz4lite)
max_hc <- 12
set.seed(1)
N <- 5e6
input_ints <- sample(1:3, N, prob = (1:3)^3, replace = TRUE)
serialize_base <- serialize(input_ints, NULL, xdr = FALSE)
serialize_lo <- lz4_serialize(input_ints, acceleration = 1)
serialize_hi_3 <- lz4hc_serialize(input_ints, level = 3)
serialize_hi_9 <- lz4hc_serialize(input_ints, level = 9)
serialize_hi_12 <- lz4hc_serialize(input_ints, level = max_hc)
compress_lo <- lz4_compress(input_ints, acceleration = 1)
compress_hi_3 <- lz4hc_compress(input_ints, level = 3)
compress_hi_9 <- lz4hc_compress(input_ints, level = 9)
compress_hi_12 <- lz4hc_compress(input_ints, level = max_hc)
Click here to show/hide benchmark code
library(lz4lite)
res <- bench::mark(
serialize(input_ints, NULL, xdr = FALSE),
lz4_serialize(input_ints, acceleration = 1),
lz4hc_serialize(input_ints, level = 3),
lz4hc_serialize(input_ints, level = 9),
lz4hc_serialize(input_ints, level = max_hc),
lz4_compress (input_ints, acceleration = 1),
lz4hc_compress (input_ints, level = 3),
lz4hc_compress (input_ints, level = 9),
lz4hc_compress (input_ints, level = max_hc),
check = FALSE
)
expression | median | itr/sec | MB/s | compression_ratio |
---|---|---|---|---|
serialize(input_ints, NULL, xdr = FALSE) | 18.99ms | 50 | 1004.5 | 1.000 |
lz4_serialize(input_ints, acceleration = 1) | 30.58ms | 32 | 623.7 | 0.222 |
lz4hc_serialize(input_ints, level = 3) | 215.84ms | 5 | 88.4 | 0.155 |
lz4hc_serialize(input_ints, level = 9) | 3.28s | 0 | 5.8 | 0.088 |
lz4hc_serialize(input_ints, level = max_hc) | 36.09s | 0 | 0.5 | 0.063 |
lz4_compress(input_ints, acceleration = 1) | 24.16ms | 41 | 789.4 | 0.222 |
lz4hc_compress(input_ints, level = 3) | 208.71ms | 5 | 91.4 | 0.155 |
lz4hc_compress(input_ints, level = 9) | 3.28s | 0 | 5.8 | 0.088 |
lz4hc_compress(input_ints, level = max_hc) | 36.36s | 0 | 0.5 | 0.063 |
uncompression speed varies slightly depending upon the compressed size.
Click here to show/hide benchmark code
res <- bench::mark(
lz4_uncompress(compress_lo),
lz4_uncompress(compress_hi_3),
lz4_uncompress(compress_hi_9),
lz4_uncompress(compress_hi_12)
)
expression | median | itr/sec | MB/s |
---|---|---|---|
lz4_uncompress(compress_lo) | 12.26ms | 79 | 1555.4 |
lz4_uncompress(compress_hi_3) | 12.37ms | 70 | 1542.4 |
lz4_uncompress(compress_hi_9) | 12.97ms | 94 | 1470.4 |
lz4_uncompress(compress_hi_12) | 6.03ms | 121 | 3161.8 |
uncompression speed varies slightly depending upon the compressed size.
Click here to show/hide benchmark code
res <- bench::mark(
unserialize(serialize_base),
lz4_unserialize(serialize_lo),
lz4_unserialize(serialize_hi_3),
lz4_unserialize(serialize_hi_9),
lz4_unserialize(serialize_hi_12)
)
expression | median | itr/sec | MB/s |
---|---|---|---|
unserialize(serialize_base) | 6.64ms | 120 | 2871.9 |
lz4_unserialize(serialize_lo) | 29.8ms | 38 | 640.0 |
lz4_unserialize(serialize_hi_3) | 29.38ms | 39 | 649.3 |
lz4_unserialize(serialize_hi_9) | 24.97ms | 48 | 763.8 |
lz4_unserialize(serialize_hi_12) | 23.87ms | 49 | 799.0 |
lz4lite
does not use the standard LZ4 frame to store data.- The compressed representation is the compressed data prefixed with a
custom 8 byte header consisting of
- 3 bytes = ‘LZ4’
- If this was produced with
lz4_serialize()
the next byte is 0x00, otherwise it is a byte representing the SEXP of the encoded object. - 4-byte length value i.e. the number of bytes in the original uncompressed data.
- This data representation
- is not compatible with the standard LZ4 frame format.
- is likely to evolve (so currently do not plan on compressing
something in one version of
lz4lite
and uncompressing in another version.)