Includes both an implementation of the Huffman compression method and a corresponding CLI. A few limitations hold this project back from being a practical compression tool. It is only intended as a proof of concept.
huffman [mode] [target] <freq>
-e
|
Encodes the |
-d
|
Decodes the |
Each of the files presented below can be found in the /test_files
subdirectory in their uncompressed form.
Files are sourced from http://textfiles.com, although their names have been changed.
File | Description | Size of .txt (bytes) |
Size of .huf (bytes) |
---|---|---|---|
|
The American Declaration of Independence |
~83k |
~47k |
|
An ASCII map aid for the game Zork |
2670 |
1040 |
|
A massive list of candy recipes |
~424k |
~240k |
|
ASCII art of Yoda (in Cyrillic characters) |
2500 |
882 |
Both encoding and decoding begin with the construction of a Huffman tree, specific to the given file.
First, character frequencies are compiled as an SLL of TreeNode
objects.
This list is then sorted and assembled into a tree by repeatedly placing the first two nodes under a new parent node until only one root node remains.
To begin encoding, an array of pointers to the tree’s leaf nodes are constructed.
The target file is then read character by character.
Each character is matched against the array of leaves until the corresponding node is found.
The algorithm then walks up the tree until it reaches the root.
Left children append a 0
to the bitstream, right children append a 1
.
Each character’s code is reversed prior to writing, which avoids ambiguity in the decoding process.
Compressed files are read 1 byte at a time. For each byte, the decoder starts at the root of the tree and traverses until it reaches a leaf. Left and right children are chosen based on the individual bits of the data.