Skip to content

๐Ÿ“ฎ Tiny and fast string compression library

License

Notifications You must be signed in to change notification settings

macarie/compatto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

82 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ฎ compatto Release Version

Compatto is a tiny and fast compression library with Unicode support, that works well with small strings too

Build Status Coverage Status License

Compatto is based on antirez's smaz concept. It targets modern browsers and Node.js. For older browsers and Node.js versions, you will need to transpile and use a TextEncoder and TextDecoder polyfill.

Features

  • โฑ Very fast to compress, even faster to decompress
  • ๐Ÿฏ Support for Unicode characters, like emojis
  • ๐Ÿ—„ User-definable dictionary

Compression ratio

Being a dictionary-based compression algorithm, the compression ratio is heavily influenced by the dictionary one uses.

With the default dictionary the compression ratio is around 1.67 for The Great Gatsby: it is compressed from 269,716 bytes in just 161,583, in 70ms. A simple string like this is a string tho, is compressed from 16 bytes in 6, so the compression ratio would be 2.66... Results may vary, I guess ๐Ÿ˜…

Install

$ npm install compatto

Or if you prefer using Yarn:

$ yarn add compatto

Usage

import { dictionary } from "compatto/dictionary"
import { compatto, DecompressError } from "compatto"

const { compress, decompress } = compatto({ dictionary })

const compressedString = compress("this is a string")
// =>  Uint8Array [ 155, 56, 172, 62, 195, 70 ]

const decompressedString = decompress(compressedString)
// => 'this is a string'

API

compatto(options)

Create a new object that implements the Compatto interface, using the options you provide.

options

Type: object

dictionary

Type: string[]

A dictionary used to compress and decompress strings. If its length is greater than 254 a TypeError will be thrown.

Please note that, as of v2.0, this option has no default value, the user has to explicitly pass it.

Compatto

Compatto is an interface that has two methods: compress() and decompress().

The returned value of compatto() implements this interface.

compress(string)

Compress a string into an array of bytes, returned as an instance of Uint8Array.

Throws a TypeError if the argument is not the correct type.

string

Type: string

A string to compress.

decompress(bytes)

Decompress an instance of Uint8Array to the original, uncompressed, string.

Throws a TypeError if the argument is not the correct type.

Throws a DecompressError if the buffer is not correctly encoded. It can be imported along with compatto() if you want to check if the error thrown is an instance of this class.

bytes

Type: Uint8Array

An array of bytes representing a compressed string.

Please note that if the dictionary used to compress a string is not the same used to decompress the generated buffer, the result of the decompression will most likely not be correct.

dictionary

Type: string[]

This is compatto's standard dictionary. Remember that even if it is the standard one, it must be explicitly set by the user!

Performance

Since v2.0, compatto generates a trie from the dictionary that is used to compress every string. Before v2.0, compatto tried to get a substring as long as the longest word in the dictionary and see if that substring was in it. If it wasn't, it tried again with a substring that was one character shorter, and so on until the substring was one character.

For compressible strings it was not that slow, but if a word had characters that were not inside the dictionary that approach was really slow!

This implementation change gave compatto a big performance boost ๐ŸšŒ๐Ÿ’จ

In v2.1 the compress() algorithm was simplified, thus leading to a performance improvement of about 20% compared to v2.0 ๐ŸŒ

Below is a little table that indicates compress()'s performance improvements over the various versions. The file used to test the library is /usr/share/dict/words: in the first row, the file was split over \n, while in the second row the whole file was used as a long piece of text.

Data v1.0 v2.0 v2.1
235,887 words ~500ms ~370ms ~295ms
2.5MB raw text ~700ms ~465ms ~365ms

As you can see the performance improved a lot: now compressing a lot of small words takes about 40% less time, and almost 50% less to compress a long piece of text if we keep v1.0 as reference!

Is there space for improvements? Absolutely! I guess that the compression algorithm can be further improved, and keep in mind that I didn't have time to do code profiling.

Browser support

The latest version of Chrome, Firefox, Safari, and Edge.

Node.js support

Compatto requires Node.js 11 or later.

Related

  • hex-my-bytes - Display bytes sequences as strings of hexadecimal digits.

About

๐Ÿ“ฎ Tiny and fast string compression library

Resources

License

Stars

Watchers

Forks

Packages

No packages published