Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 914 Bytes

README.en.md

File metadata and controls

22 lines (13 loc) · 914 Bytes

English | 简体中文

PTQ INT8 Quantization

This is a script for fast PTQ (Post Training Quantization) INT8 quantization using TensorRT, supporting both dynamic and static batching.

Usage

First, configure the model you want to quantize in calibration.yaml.

calibrator.data is the path to the data used for calibration, and calibrator.cache is the location to save the generated calibration files.

If you choose dynamic batching, ensure that the dimensions of batch_shape match shapes.opt. If you choose static batching, set dynamic to False, and ignore shapes.

After configuring calibration.yaml, run the following command to perform quantization:

cd tools
python ptq_calibration.py

The precision and latency after PTQ quantization vary depending on the model. For maximum precision, it is recommended to use QAT quantization.