Node.js interface to the native liblzma compression library (.xz file format, among others)
This package provides interfaces for compression and decompression
of .xz
(and legacy .lzma
) files, both stream-based and string-based.
Simply install lzma-native
via npm:
$ npm install --save lzma-native
Note: As of version 1.0.0, this module provides pre-built binaries for multiple Node.js versions and all major OS using node-pre-gyp, so for 99 % of users no compiler toolchain is necessary. Please create an issue here if you have any trouble installing this module.
Note: lzma-native@2.x
requires a Node version >= 4. If you want to support
Node 0.10
or 0.12
, you can feel free to use lzma-native@1.x
.
If you don’t have any fancy requirements, using this library is quite simple:
var lzma = require('lzma-native');
var compressor = lzma.createCompressor();
var input = fs.createReadStream('README.md');
var output = fs.createWriteStream('README.md.xz');
input.pipe(compressor).pipe(output);
For decompression, you can simply use lzma.createDecompressor()
.
Both functions return a stream where you can pipe your input in and read your (de)compressed output from.
If you want your input/output to be Buffers (strings will be accepted as input), this even gets a little simpler:
lzma.compress('Banana', function(result) {
console.log(result); // <Buffer fd 37 7a 58 5a 00 00 01 69 22 de 36 02 00 21 ...>
});
Again, replace lzma.compress
with lzma.decompress
and you’ll get the inverse transformation.
lzma.compress()
and lzma.decompress()
will return promises and you don’t need to provide any kind of callback
(Example code).
Apart from the API described here, lzma-native
implements the APIs of the following
other LZMA libraries so you can use it nearly as a drop-in replacement:
- node-xz via
lzma.Compressor
andlzma.Decompressor
- LZMA-JS via
lzma.LZMA().compress
andlzma.LZMA().decompress
, though without actual support for progress functions and returningBuffer
objects instead of integer arrays. (This produces output in the.lzma
file format, not the.xz
format!)
Since version 1.5.0
, lzma-native supports liblzma’s built-in multi-threading
encoding capabilities. To make use of them, set the threads
option to
an integer value: lzma.createCompressor({ threads: n });
. You can use
value of 0
to use the number of processor cores. This option is only
available for the easyEncoder
(the default) and streamEncoder
encoders.
Note that, by default, encoding will take place in Node’s libuv thread pool regardless of this option, and setting it when multiple encoders are running is likely to affect performance negatively.
Encoding strings and Buffer objects
compress()
– Compress strings and Buffersdecompress()
– Decompress strings and BuffersLZMA().compress()
(LZMA-JS compatibility)LZMA().decompress()
(LZMA-JS compatibility)
createCompressor()
– Compress streamscreateDecompressor()
– Decompress streamscreateStream()
– (De-)Compression with advanced optionsCompressor()
(node-xz compatibility)Decompressor()
(node-xz compatibility)
isXZ()
– Test Buffer for.xz
file formatparseFileIndex()
– Read.xz
file metadataparseFileIndexFD()
– Read.xz
metadata from a file descriptor
crc32()
– Calculate CRC32 checksumcheckSize()
– Return required size for specific checksum typeeasyDecoderMemusage()
– Expected memory usageeasyEncoderMemusage()
– Expected memory usagerawDecoderMemusage()
– Expected memory usagerawEncoderMemusage()
– Expected memory usageversionString()
– Native library version stringversionNumber()
– Native library numerical version identifier
lzma.compress(string, [opt, ]on_finish)
lzma.decompress(string, [opt, ]on_finish)
Param | Type | Description |
---|---|---|
string |
Buffer / String | Any string or buffer to be (de)compressed (that can be passed to stream.end(…) ) |
[opt ] |
Options / int | Optional. See options |
on_finish |
Callback | Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as on_finish(null, err) in case of an error. |
These methods will also return a promise that you can use directly.
Example code:
lzma.compress('Bananas', 6, function(result) {
lzma.decompress(result, function(decompressedResult) {
assert.equal(decompressedResult.toString(), 'Bananas');
});
});
lzma.compress('Bananas', 6).then(function(result) {
return lzma.decompress(result);
}).then(function(decompressedResult) {
assert.equal(decompressedResult.toString(), 'Bananas');
}).catch(function(err) {
// ...
});
lzma.LZMA().compress(string, mode, on_finish[, on_progress])
lzma.LZMA().decompress(string, on_finish[, on_progress])
(Compatibility; See LZMA-JS for the original specs.)
Note that the result of compression is in the older LZMA1 format (.lzma
files).
This is different from the more universally used LZMA2 format (.xz
files) and you will
have to take care of possible compatibility issues with systems expecting .xz
files.
Param | Type | Description |
---|---|---|
string |
Buffer / String / Array | Any string, buffer, or array of integers or typed integers (e.g. Uint8Array ) |
mode |
int | A number between 0 and 9, indicating compression level |
on_finish |
Callback | Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as on_finish(null, err) in case of an error. |
on_progress |
Callback | Indicates progress by passing a number in [0.0, 1.0]. Currently, this package only invokes the callback with 0.0 and 1.0. |
These methods will also return a promise that you can use directly.
This does not work exactly as described in the original LZMA-JS specification:
- The results are
Buffer
objects, not integer arrays. This just makes a lot more sense in a Node.js environment. on_progress
is currently only called with0.0
and1.0
.
Example code:
lzma.LZMA().compress('Bananas', 4, function(result) {
lzma.LZMA().decompress(result, function(decompressedResult) {
assert.equal(decompressedResult.toString(), 'Bananas');
});
});
For an example using promises, see compress()
.
lzma.createCompressor([options])
lzma.createDecompressor([options])
Param | Type | Description |
---|---|---|
[options ] |
Options / int | Optional. See options |
Return a duplex stream, i.e. a both readable and writable stream.
Input will be read, (de)compressed and written out. You can use this to pipe
input through this stream, i.e. to mimick the xz
command line util, you can write:
var compressor = lzma.createCompressor();
process.stdin.pipe(compressor).pipe(process.stdout);
The output of compression will be in LZMA2 format (.xz
files), while decompression
will accept either format via automatic detection.
lzma.Compressor([preset], [options])
lzma.Decompressor([options])
(Compatibility; See node-xz for the original specs.)
These methods handle the .xz
file format.
Param | Type | Description |
---|---|---|
[preset ] |
int | Optional. See options.preset |
[options ] |
Options | Optional. See options |
Return a duplex stream, i.e. a both readable and writable stream.
Input will be read, (de)compressed and written out. You can use this to pipe
input through this stream, i.e. to mimick the xz
command line util, you can write:
var compressor = lzma.Compressor();
process.stdin.pipe(compressor).pipe(process.stdout);
lzma.createStream(coder, options)
Param | Type | Description |
---|---|---|
[coder ] |
string | Any of the supported coder names, e.g. "easyEncoder" (default) or "autoDecoder" . |
[options ] |
Options / int | Optional. See options |
Return a duplex stream for (de-)compression. You can use this to pipe input through this stream.
The available coders are (the most interesting ones first):
easyEncoder
Standard LZMA2 (.xz
file format) encoder. Supportsoptions.preset
andoptions.check
options.autoDecoder
Standard LZMA1/2 (both.xz
and.lzma
) decoder with auto detection of file format. Supportsoptions.memlimit
andoptions.flags
options.aloneEncoder
Encoder which only uses the legacy.lzma
format. Supports the whole range of LZMA options.
Less likely to be of interest to you, but also available:
aloneDecoder
Decoder which only uses the legacy.lzma
format. Supports theoptions.memlimit
option.rawEncoder
Custom encoder corresponding tolzma_raw_encoder
(See the native library docs for details). Supports theoptions.filters
option.rawDecoder
Custom decoder corresponding tolzma_raw_decoder
(See the native library docs for details). Supports theoptions.filters
option.streamEncoder
Custom encoder corresponding tolzma_stream_encoder
(See the native library docs for details). Supportsoptions.filters
andoptions.check
options.streamDecoder
Custom decoder corresponding tolzma_stream_decoder
(See the native library docs for details). Supportsoptions.memlimit
andoptions.flags
options.
Option name | Type | Description |
---|---|---|
check |
check | Any of lzma.CHECK_CRC32 , lzma.CHECK_CRC64 , lzma.CHECK_NONE , lzma.CHECK_SHA256 |
memlimit |
float | A memory limit for (de-)compression in bytes |
preset |
int | A number from 0 to 9, 0 being the fastest and weakest compression, 9 the slowest and highest compression level. (Please also see the xz(1) manpage for notes – don’t just blindly use 9!) You can also OR this with lzma.PRESET_EXTREME (the -e option to the xz command line utility). |
flags |
int | A bitwise or of lzma.LZMA_TELL_NO_CHECK , lzma.LZMA_TELL_UNSUPPORTED_CHECK , lzma.LZMA_TELL_ANY_CHECK , lzma.LZMA_CONCATENATED |
synchronous |
bool | If true, forces synchronous coding (i.e. no usage of threading) |
bufsize |
int | The default size for allocated buffers |
threads |
int | Set to an integer to use liblzma’s multi-threading support. 0 will choose the number of CPU cores. |
blockSize |
int | Maximum uncompressed size of a block in multi-threading mode |
timeout |
int | Timeout for a single encoding operation in multi-threading mode |
options.filters
can, if the coder supports it, be an array of filter objects, each with the following properties:
.id
Any oflzma.FILTERS_MAX
,lzma.FILTER_ARM
,lzma.FILTER_ARMTHUMB
,lzma.FILTER_IA64
,lzma.FILTER_POWERPC
,lzma.FILTER_SPARC
,lzma.FILTER_X86
orlzma.FILTER_DELTA
,lzma.FILTER_LZMA1
,lzma.FILTER_LZMA2
The delta filter supports the additional option .dist
for a distance between bytes (see the xz(1) manpage).
The LZMA filter supports the additional options .dict_size
, .lp
, .lc
, pb
, .mode
, nice_len
, .mf
, .depth
and .preset
. See the xz(1) manpage for meaning of these parameters and additional information.
lzma.crc32(input[, encoding[, previous]])
Compute the CRC32 checksum of a Buffer or string.
Param | Type | Description |
---|---|---|
input |
string / Buffer | Any string or Buffer. |
[encoding ] |
string | Optional. If input is a string, an encoding to use when converting into binary. |
[previous ] |
int | The result of a previous CRC32 calculation so that you can compute the checksum per each chunk |
Example usage:
lzma.crc32('Banana') // => 69690105
lzma.checkSize(check)
Return the byte size of a check sum.
Param | Type | Description |
---|---|---|
check |
check | Any supported check constant. |
Example usage:
lzma.checkSize(lzma.CHECK_SHA256) // => 16
lzma.checkSize(lzma.CHECK_CRC32) // => 4
lzma.easyDecoderMemusage(preset)
Returns the approximate memory usage when decoding using easyDecoder for a given preset.
Param | Type | Description |
---|---|---|
preset |
preset | A compression level from 0 to 9 |
Example usage:
lzma.easyDecoderMemusage(6) // => 8454192
lzma.easyEncoderMemusage(preset)
Returns the approximate memory usage when encoding using easyEncoder for a given preset.
Param | Type | Description |
---|---|---|
preset |
preset | A compression level from 0 to 9 |
Example usage:
lzma.easyEncoderMemusage(6) // => 97620499
lzma.rawDecoderMemusage(filters)
Returns the approximate memory usage when decoding using rawDecoder for a given filter list.
Param | Type | Description |
---|---|---|
filters |
array | An array of filters |
lzma.rawEncoderMemusage(filters)
Returns the approximate memory usage when encoding using rawEncoder for a given filter list.
Param | Type | Description |
---|---|---|
filters |
array | An array of filters |
lzma.versionString()
Returns the version of the underlying C library.
Example usage:
lzma.versionString() // => '5.2.3'
lzma.versionNumber()
Returns the version of the underlying C library.
Example usage:
lzma.versionNumber() // => 50020012
lzma.isXZ(input)
Tells whether an input buffer is an XZ file (.xz
, LZMA2 format) using the
file format’s magic number. This is not a complete test, i.e. the data
following the file header may still be invalid in some way.
Param | Type | Description |
---|---|---|
input |
string / Buffer | Any string or Buffer (integer arrays accepted). |
Example usage:
lzma.isXZ(fs.readFileSync('test/hamlet.txt.xz')); // => true
lzma.isXZ(fs.readFileSync('test/hamlet.txt.lzma')); // => false
lzma.isXZ('Banana'); // => false
(The magic number of XZ files is hex fd 37 7a 58 5a 00
at position 0.)
lzma.parseFileIndex(options[, callback])
Read .xz
file metadata.
options.fileSize
needs to be an integer indicating the size of the file
being inspected, e.g. obtained by fs.stat()
.
options.read(count, offset, cb)
must be a function that reads count
bytes
from the underlying file, starting at position offset
. If that is not
possible, e.g. because the file does not have enough bytes, the file should
be considered corrupt. On success, cb
should be called with a Buffer
containing the read data. cb
can be invoked as cb(err, buffer)
, in which
case err
will be passed along to the original callback
argument when set.
callback
will be called with err
and info
as its arguments.
If no callback
is provided, options.read()
must work synchronously and
the file info will be returned from lzma.parseFileIndex()
.
Example usage:
fs.readFile('test/hamlet.txt.xz', function(err, content) {
// handle error
lzma.parseFileIndex({
fileSize: content.length,
read: function(count, offset, cb) {
cb(content.slice(offset, offset + count));
}
}, function(err, info) {
// handle error
// do something with e.g. info.uncompressedSize
});
});
lzma.parseFileIndexFD(fd, callback)
Read .xz
metadata from a file descriptor.
This is like parseFileIndex()
, but lets you
pass an file descriptor in fd
. The file will be inspected using
fs.stat()
and fs.read()
. The file descriptor will not be opened or closed
by this call.
Example usage:
fs.open('test/hamlet.txt.xz', 'r', function(err, fd) {
// handle error
lzma.parseFileIndexFD(fd, function(err, info) {
// handle error
// do something with e.g. info.uncompressedSize
fs.close(fd, function(err) { /* handle error */ });
});
});
This package includes the native C library, so there is no need to install it separately.
The original C library package contains code under various licenses, with its core (liblzma) being public domain. See its contents for details. This wrapper is licensed under the MIT License.
Other implementations of the LZMA algorithms for node.js and/or web clients include:
- lzma-purejs
- LZMA-JS
- node-xz (native)
- node-liblzma (native)
Note that LZMA has been designed to have much faster decompression than compression, which is something you may want to take into account when choosing an compression algorithm for large files. Almost always, LZMA achieves higher compression ratios than other algorithms, though.
Initial development of this project was financially supported by Tradity.