The goal of this lab is to implement the Huffman Tree Encoding.
You are free to create the design to solve this problem. The general steps for this program are:
- Open the file for input and create the Frequency table.
- Sort the Frequency Table on frequency.
- Using the Frequency Table start creating the Huffman Tree from bottom up.
- Once the tree is created, traverse it to create the code for each of the symbols (character) found in the file. This will result in a Encoding Table.
- Using the Encoding Table read the input file, and for each character use the table to get the encoding and write the encoding into the output file.
- Add an option to decode the encoded file.
- To be able to decode you are going to need to store the Huffman Tree in the encoded file to be able to decode the file.
- The resulting file must be identical to the original file.
- Using bitwise operations, convert the binary string generated by the encoding phase into actual binary representation, this will result in a compressed file, that should be smaller than the original file.
- You need to provide the decoding to verify that your program is working.
You are given eight files:
test-file-0.txt
the first input filetest-file-0.table
the output to screen of the first input filetest-file-0.encoded
the output file of the first input filetest-file-1.txt
the second input filetest-file-1.table
the output to screen of the second input filetest-file-1.encoded
the output file of the second input filetest-file-2.txt
the third input filetest-file-2.table
the output to screen of the third input filetest-file-2.encoded
the output file of the third input filetest-file-3.txt
the fourth input filetest-file-3.table
the output to screen of the fourth input filetest-file-3.encoded
the output file of the fourth input file
Ideally you should write a makefile, but you can compile the following way:
g++ yoursource1.cpp yoursource2.cpp -o huffman -std=c++11 -Wall
This will result in an executable name huffman
, remember to specify the name of the exacutable in case you make your own makefile
. The program should work with command line parameters the following way:
./huffman -encode inputfile outputfile
- It will encode the file
inpufile
and create a fileoutputfile
with the huffman encoding of the first file
- It will encode the file
./huffman -decode inputfile outputfile
- It will decode the
inputfile
and write the decoded information into theoutputfile
- It will decode the
To encode a file named mybook.txt
and put the encoding into a file named myencodedbook.txt
:
./huffman -encode mybook.txt myencodedbook.txt
To decode a file named coded.txt
and create a decoded file named original.txt
:
./huffman -decode coded.txt original.txt
Your program should output the Coding Table in the screen using JSON-like format, as follows:
{key: , code: 11}
{key: a, code: 010}
{key: e, code: 0010}
{key: o, code: 0110}
{key: u, code: 0111}
{key: r, code: 1000}
{key: n, code: 1010}
{key: i, code: 1011}
{key: l, code: 00001}
{key: s, code: 00010}
{key: d, code: 00110}
{key: m, code: 10011}
{key: p, code: 000000}
{key: c, code: 000001}
{key: CR, code: 000110}
{key: t, code: 000111}
{key: g, code: 001111}
{key: q, code: 100100}
{key: b, code: 100101}
{key: h, code: 0011100}
{key: f, code: 00111010}
{key: z, code: 00111011}
This is the output of the program when ran using test-file-0.txt
. Things to notice:
- The file is sorted by encoding, shorter encodings first
- For certain characters, they are changed to be able to display:
- CR for displaying charater
'\n'
- LF for displaying character
'\r'
- CR for displaying charater
The encoded file (without extra challenge) will be a single line with 0 and 1 only.
- Your code must compile.
- Your code must not compile with warnings.
- Your program must not crash.
- Your program is expected to implement a Tree Data Structure, there will be a code inspection to verify this.
- Every source file must contain header comments with the following format:
/*
Filename: huffmantree.h
Description: Declaration of the class HuffmanTree to represent the binary Huffman Tree
Author: McDonald Berger
Date: 05/29/2019
Course: Data Structures II
*/
- The program will be tested with the given files, and with additional files. Your grade will be calculated based on the screen output and file output. These outputs must be identical to the given outputs.
- Your code needs to show good programming practices: appropriate amount of comments for your methods, indentation, meaningful variable names, identifiers convention (CamelCase for functions, camelCase for variables, _camelCase for data members, SNAKE_CASE for constants), header comments, correct file names, etc. Failure to code appropriate will result in strong points penalization.
Once you finished coding your lab you may want to test it to check it is correctly working. The steps would be:
- Compile your program, make sure the executable name is
huffman
- Run the program using the first input file:
./huffman -enconde test-file-0.txt my-0.encoded > my.table
- Previous step will create files:
my-0.encoded
contains the encoded filetest-file-0.txt
my-0.table
contains the output to screen that shows the Huffman Code Table
- Compare the files, if there is no output, it means the files are identical and your program is working as expected.
diff test-file-0.encoded my-0.encoded
This will check if your encoded file is identical to the given file.diff test-file-0.table my-0.table
This will check if your encoding table is identical to the given encoding table.
The comparison of the encoded files will fail, since you will have to save the Huffman Tree in the beginning of the file. So, you will need to follow this steps (Applies for Extra Challenge and for Extra - Extra Challenge).
- Run the program to decode:
./huffman -decode my-0.encoded my-0.decoded
- The previous command will convert the encoded file into the original file
- Compare the file with the original:
diff my-0.decoded test-file-0.txt
if there is no output, it means the files are identical and your program is working as expected.
NOTE: If you are using the Windows command line, you will need to use fc
instead of diff
.