This project implements Huffman Coding for text compression and decompression. Huffman Coding is a Greedy lossless data compression algorithm that uses variable-length codes to represent characters based on their frequencies. This project includes classes for managing binary trees, min-heaps, and custom queues, providing a comprehensive example of how Huffman Coding can be applied.
- Binary Tree
- Priority_queue
- Heap
- File handling
- Huffman Coding Algorithm
huffman/
|-- init.py
|-- binary_tree.py
|-- custom_queue.py
|-- heap.py
|-- huffman_code.py
|-- README.md
binary_tree.py
: Contains theBinaryTree
class for managing binary tree operations.custom_queue.py
: Contains theQueue
andQueueNode
classes for queue operations.heap.py
: Contains theminHeap
andNode
classes for heap operations.huffman_code.py
: Contains theHuffmanCode
class for compression and decompression.
To use this project, clone the repository and install the required dependencies.
git clone https://github.com/rohanmittal1163/Huffman-Compressor-Decompressor.git
cd huffman-coding
To compress a file, use the HuffmanCode class:
from huffman.huffman_code import HuffmanCode
path = "path/to/your/textfile.txt"
huffman = HuffmanCode(path)
compressed_path = huffman.compression()
print(f"File compressed to: {compressed_path}")
To decompress a file, use the HuffmanCode class:
decompressed_path = huffman.decompression(compressed_path)
print(f"File decompressed to: {decompressed_path}")
Here's a complete example of how to compress and decompress a text file:
from huffman.huffman_code import HuffmanCode
# Initialize with the path to the file to compress
path = "path/to/your/textfile.txt"
huffman = HuffmanCode(path)
# Compress the file
compressed_path = huffman.compression()
print(f"File compressed to: {compressed_path}")
# Decompress the file
decompressed_path = huffman.decompression(compressed_path)
print(f"File decompressed to: {decompressed_path}")
- Calculate the frequency of each character in the input text.
- Initialize a priority queue (min-heap) with nodes containing characters and their frequencies.
- Construct a binary tree using the priority queue:
- Extract two nodes with the lowest frequencies from the priority queue.
- Create a new node with these two nodes as children, and the sum of their frequencies as its frequency.
- Insert the new node back into the priority queue.
- Repeat until only one node remains in the priority queue, forming the Huffman tree.
- Traverse the Huffman tree to generate binary codes for each character:
- Start at the root of the tree.
- Traverse left for '0' and right for '1'.
- Record the binary code for each character encountered along the way.
- Replace each character in the input text with its corresponding Huffman code.
- Concatenate the binary codes to form the compressed text.
- Add padding to the compressed text to ensure the length is a multiple of 8 (if needed).
- Record the number of padding bits added.
- Output the compressed text along with any padding information.
- Read the compressed text and padding information.
- Remove the padding bits to restore the original length.
- Traverse the Huffman tree using the binary codes to decode the compressed text back to its original form.
- Compression: Compresses text files using Huffman Coding.
- Decompression: Decompresses files back to their original text.
- Binary Tree Management: Constructs and traverses binary trees.
- Min-Heap Implementation: Efficient priority queue operations.
- Custom Queue: Supports queue operations for Huffman tree construction.
This project is open source and we are happy to receive contributions. If you would like to contribute, please follow these steps:
- Make a fork of the repository.
- Create a branch for your feature or bugfix (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added my new feature'
) - Push your branch (
git push origin my-new-feature
) - Create a pull request.