HDL-deflate

FPGA implementation of deflate (de)compress RFC 1950/1951 ((g)zip / zlib)

This design is implemented in MyHDL (www.myhdl.org) and can be translated to Verilog.

It has been verified in Icarus, Xilinx Vivado and on a physical Xilinx device (Digilent Arty).

In addition it has been tested with Lattice iCE40 UltraPlus using IceStorm and an Upduino.

Also on an ECP5 board with Lattice Diamond on Ubuntu 18.04.

Usage should be clear from the test bench in test_deflate.py.

Tunable parameters

OBSIZE = 8192   # Size of output buffer (BRAM)
                # You need 32768 to decompress ALL valid deflate streams!

IBSIZE = 2048   # Size of input buffer (LUT-RAM)

CWINDOW = 32    # Search window for compression

Sliding input window

One can use a sliding window to reduce the size of the input buffer and the LUT-usage.

The minimal value is 2 * CWINDOW (64 bytes), the UnitTest in test_deflate.py uses this strategy.

Compression efficiency

By default the compressor will reduce repeated 3/4/5 byte sequences in the search window to 15 bit. This will result in a decent compression ratio for many real life input data patterns.

At the expense of additional LUTs one can improve this by enlarging the CWINDOW or expanding the matching code to include 6/7/8/9/10 byte matches. Set MATCH10 to True in the top of deflate.py to activate this option.

Another strategy for data sets with just a small set of used byte values would be to use a dedicated pre-computed Huffman tree. I could add this if there is interest, but it is probably better to use a more dense coding in your FPGA application data in the first place.

Decompression speed

Method 0 (copy mode) 2 cycles for each output byte. Other methods from 1 (long repeated sequences) to 4 cycles for each output byte.

Compression speed

To reduce LUT usage the original implementation matched each slot in the search window in a dedicated clock cycle. By setting FAST to True it will generate the logic to match the whole window in a single cycle. The effective speed will be around 1 input byte every 3 cycles.

Disabling functionality to save LUTs

The compress mode can be disabled by setting COMPRESS to False.

The decompress mode can be disabled by setting DECOMPRESS to False.

As an option you can disable dynamic tree decompression by setting DYNAMIC to False. This will save a lot of BRAM and LUTs and HDL-Deflate compressed output is always using a static tree, but zlib will normally generate dynamic trees. Set zlib option Z_FIXED to generate streams with a static tree.

In general the size of leaves and d_leaves can be reduced a lot when the maximal length of the input stream is less than 32768. One can replace test_data() in test_deflate.py with a specific version which generates typical test data for the intended FPGA application, and repeatedly halve the sizes of the leaves arrays until the test fails.

FAST MATCH10 compress only has quite good resource usage.

LOWLUT disables some options (DYNAMIC and multi block handling) for minimal LUT usage.

Practical considerations

In general HDL-Deflate is interesting when speed is important. When speed is not a real issue using a (soft) CPU with zlib and dynamic RAM is probably the better approach. Especially decompression is also reasonable fast with a CPU and HDL-Deflate needs a lot of BRAM when configured to decompress ANY deflate input stream.

Compression is another story because it is a LOT faster in hardware with the FAST option and uses a reasonable amount of LUTs on Xilinx. Lattice compress resource usage is bigger because it has no LUT-ram.

Decompression only mode with the LOWLUT option can be interesting because it also has a reasonable size. Its size is comparable with a soft CPU on Lattice (but it is a lot faster) and it is much smaller on Xilinx.

FPGA validation

Xilinx

Default (Decompress with IBUF = 16 * CWINDOW and Compress with FAST/MATCH10)

Resource	Estimation
LUT	9823
LUTRAM	1248
FF	2910
BRAM	18

Compress only and FAST and MATCH10

Resource	Estimation
LUT	2854
LUTRAM	156
FF	760
BRAM	8.5

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
Arty-A7-100.xdc		Arty-A7-100.xdc
HDL-Deflate.xpr		HDL-Deflate.xpr
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
chip40.v		chip40.v
deflate.py		deflate.py
deflate.v		deflate.v
dump.v		dump.v
test_deflate.py		test_deflate.py
upduino_v2.pcf		upduino_v2.pcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDL-deflate

Tunable parameters

Sliding input window

Compression efficiency

Decompression speed

Compression speed

Disabling functionality to save LUTs

Practical considerations

FPGA validation

Xilinx

Default (Decompress with IBUF = 16 * CWINDOW and Compress with FAST/MATCH10)

Compress only and FAST and MATCH10

Compress only and FAST

License

tomtor/HDL-deflate

Folders and files

Latest commit

History

Repository files navigation

HDL-deflate

Tunable parameters

Sliding input window

Compression efficiency

Decompression speed

Compression speed

Disabling functionality to save LUTs

Practical considerations

FPGA validation

Xilinx

Default (Decompress with IBUF = 16 * CWINDOW and Compress with FAST/MATCH10)

Compress only and FAST and MATCH10

Compress only and FAST