Skip to content

hankotanks/huffman_encoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Compression

Includes both an implementation of the Huffman compression method and a corresponding CLI. A few limitations hold this project back from being a practical compression tool. It is only intended as a proof of concept.

Usage

huffman [mode] [target] <freq>
-e

Encodes the target file. Expects only one argument. The encoder creates two files. The first contains the compressed data (a .huf file) and the frequency table is written to the other.

-d

Decodes the target file using the given freq table. This operation expects both the target file and its corresponding frequency table to have appropriate extensions (.huf and .freq, respectively). The decompressed data is written to a .dec file, which is human-readable.

Sample Files

Each of the files presented below can be found in the /test_files subdirectory in their uncompressed form. Files are sourced from http://textfiles.com, although their names have been changed.

File Description Size of .txt (bytes) Size of .huf (bytes)

decind.txt

The American Declaration of Independence

~83k

~47k

zork.txt

An ASCII map aid for the game Zork

2670

1040

candy.txt

A massive list of candy recipes

~424k

~240k

yoda.txt

ASCII art of Yoda (in Cyrillic characters)

2500

882

Implementation

Both encoding and decoding begin with the construction of a Huffman tree, specific to the given file. First, character frequencies are compiled as an SLL of TreeNode objects. This list is then sorted and assembled into a tree by repeatedly placing the first two nodes under a new parent node until only one root node remains.

To begin encoding, an array of pointers to the tree’s leaf nodes are constructed. The target file is then read character by character. Each character is matched against the array of leaves until the corresponding node is found. The algorithm then walks up the tree until it reaches the root. Left children append a 0 to the bitstream, right children append a 1. Each character’s code is reversed prior to writing, which avoids ambiguity in the decoding process.

Compressed files are read 1 byte at a time. For each byte, the decoder starts at the root of the tree and traverses until it reaches a leaf. Left and right children are chosen based on the individual bits of the data.

(Handled) Error States

The program will fail and exit if…​
  • an invalid command flag is given

  • wrong number of arguments are supplied for the intended operation

  • any file argument cannot be opened

  • the decoder recieves files without appropriate extensions

About

An implementation of the Huffman coding system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages