This TensorFlow 2.x Quantization toolkit quantizes (inserts Q/DQ nodes) TensorFlow 2.x Keras models for Quantization-Aware Training (QAT). We follow NVIDIA's QAT recipe, which leads to optimal model acceleration with TensorRT on NVIDIA GPUs and hardware accelerators.
- Implements NVIDIA Quantization recipe.
- Supports fully automated or manual insertion of Quantization and DeQuantization (QDQ) nodes in the TensorFlow 2.x model with minimal code.
- Can easily to add support for new layers.
- Quantization behavior can be set programmatically.
- Implements automatic tests for popular architecture blocks such as residual and inception.
- Offers utilities for TensorFlow 2.x to TensorRT conversion via ONNX.
- Includes example workflows.
Python >= 3.8
TensorFlow >= 2.8
tf2onnx >= 1.10.0
onnx-graphsurgeon
pytest
pytest-html
TensorRT (optional) >= 8.4 GA
Latest TensorFlow 2.x docker image from NGC is recommended.
$ cd ~/
$ git clone ssh://git@gitlab-master.nvidia.com:12051/TensorRT/Tools/tensorflow-quantization.git
$ docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
$ docker run -it --runtime=nvidia --gpus all -v ~/tensorflow-quantization:/home/tensorflow-quantization nvcr.io/nvidia/tensorflow:22.03-tf2-py3 /bin/bash
After last command, you will be placed in /workspace
directory inside the running docker container whereas tensorflow-quantization
repo is mounted in /home
directory.
$ cd ~/home/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
$ cd ~/
$ git clone ssh://git@gitlab-master.nvidia.com:12051/TensorRT/Tools/tensorflow-quantization.git
$ cd tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
TensorFlow 2.x Quantization toolkit userguide.
- Only Quantization Aware Training (QAT) is supported as a quantization method.
- Only Functional and Sequential Keras models are supported. Original Keras layers are wrapped into quantized layers using TensorFlow's clone_model method, which doesn't support subclassed models.
- Saving quantized version of few layers may not be supported in
TensorFlow < 2.8
. TensorRT >= 8.2
is recommended since it supports engine visualization.
- GTC 2022 talk
- Quantization Basics whitepaper