Skip to content

Releases: quic/aimet

version 1.27.0

28 Jul 19:14
Compare
Choose a tag to compare

What's New

Keras

  • Update support for TFOpLambda layers in Batch Norm Folding with extra call args/kwargs.

PyTorch

  • Added AIMET to support PyTorch version 1.13.0. Only ONNX opset 14 is supported for export.
  • [experimental] Debugging APIs have been added for dumping intermediate tensor data. This data can be used with current QNN/SNPE tools for debugging accuracy problems. Layer Output Generation API gives incorrect tensor data for the layer just before Relu when used for original FP32 model.
  • [experimental] Support for embedding AIMET encodings within the graph using ONNX quantize/dequantize operators. Currently this is option is only supported when using 8bit per-tensor quantization.
  • Fixed a bug in AIMET QuantSim for PyTorch models to handle non-contiguous tensors.

ONNX

  • AIMET support for ONNX 1.11.0 has been added. However there is currently limited op support in QNN/SNPE. If the model fails to load please continue to use opset 11 for export.

TensorFlow

  • [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

Documentation

version 1.26.1

12 Jul 00:30
Compare
Choose a tag to compare

What's New

TensorFlow

  • Upgraded AIMET to support TensorFlow version 2.10.1 (AIMET remains compatible with TensorFlow 2.4).
  • Several bug fixes

Common

  • Upgraded to Ubuntu 20 base image for all variants.

Documentation

version 1.26.0

12 May 22:34
57ed1b5
Compare
Choose a tag to compare

What's New

Keras

  • Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization.
  • Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
  • Update to Model Preparer to replace separable conventional with depth wise and point wise conv layers.
  • Fixes BN fold implementation to account for a subsequent multi-input layer
  • Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.

PyTorch

  • Several bug fixes

TensorFlow

  • Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization
  • Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
  • Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.

Common

  • Documentation updates for taking AIMET models to target.
  • Standalone Batchnorm layers parameter’s conversion such that it will behave as linear/dense layer.

Experimental

  • Added new Architecture Checker feature to identify and report model architecture constructs that are not ideal for quantized runtimes. Users can utilize this information to change their model architectures accordingly.

Documentation

version 1.25.0

09 Mar 23:14
Compare
Choose a tag to compare

What's New

Keras

  • Added QuantAnalyzer feature
  • Adds Batch Normalization folding for Functional Keras Models. This allows the default config files to work for super grouping.
  • Resolved an issue with quantizer placement in Sequential blocks in subclassed models

PyTorch

  • Added AutoQuant V2 which includes advanced features such as out-of-the-box inference, model preparer, quant scheme search, improved summary report, etc.
  • Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
  • Fixes to improve EfficientNetB4 accuracy w/respect to target
  • Fixed rare case where quantizer may calculate incorrect offset when generating QAT 2.0 learned encodings

TensorFlow

  • Added QuantAnalyzer feature
  • Fixed an accuracy issue due to rare cases where the incorrect BN epsilon was being used
  • Fixed an accuracy issue due to Quantsim export incorrectly recomputing QAT2.0 encodings

Common

  • Updated AIMET python package version format to support latest pip
  • Fixed an issue where not all inputs might be quantized properly

Documentation

version 1.24.0

20 Jan 00:18
eda99b2
Compare
Choose a tag to compare

What's New

  • Export quantsim configuration for configuring downstream target quantization

PyTorch

  • Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
  • Added support for AMP 2.0 which enables faster automatic mixed precision
  • Added support for QAT for INT4 quantized models – includes a feature for performing BN Re-estimation after QAT

Keras

  • Added support for AMP 2.0 which enables faster automatic mixed precision
  • Support for basic transformer networks
  • Added support for subclassed models. The current subclassing feature includes support for only a single level of subclassing and does not support lambdas.
  • Added QAT per-channel gradient support
  • Minor updates to the quantization configuration
  • Fixed QuantSim bug where layers using dtypes other than float were incorrectly quantized

TensorFlow

  • Added an additional prelu mapping pattern to ensure proper folding and quantsim node placement
  • Fixed per-channel encoding representation to align with Pytorch and Keras

Documentation

version 1.23.0

14 Nov 19:00
a422782
Compare
Choose a tag to compare

What's New

  • TF-enhanced calibration scheme has been accelerated using a custom CUDA kernel. Runs significantly faster now.
  • Installation instructions are now combined with rest of the documentation (User-Guide and API docs)

PyTorch

  • Fixed backward pass of the fake-quantize (QcQuantizeWrapper) nodes to handle symmetric mode correctly
  • Per-channel quantization is now enabled on a per-op-type basis
  • Support for recursively excluding module from a root module in QuantSim
  • Support for excluding layers when running model validator and model preparer
  • Reduced memory usage in AdaRound
  • Fixed bugs in AdaRound for per-channel quantization
  • Made ConnectedGraph more robust when identifying custom layers
  • Added jupyter notebook-based examples for the following features
  • AutoQuant: Added support for sparse conv layers in QuantSim (experimental)

Keras

  • Added support for Keras per-channel quantization
  • Changed interface to CLE to accept a pre-compiled model
  • Added jupyter notebook-based examples for the following features: Transformer quantization

TensorFlow

  • Fix to avoid unnecessary indexing in AdaRound

Documentation

version 1.22.2

15 Sep 19:07
Compare
Choose a tag to compare

What's new

Tensorflow

  • Added support for supergroups : MatMul + Add
  • Added support for TF-Slim BN name with backslash
  • Added support for Depthwise + Conv in CLS

Documentation

1.22.1

04 Aug 10:29
e09d587
Compare
Choose a tag to compare

What's Changed

  • Added support for QuantizableMultiHeadAttention for PyTorch nn.transformer layers by @quic-kyuykim
  • Support functional conv2d in model preparer by @quic-kyuykim
  • Enable qat with multi gpu by @quic-mangal
  • Optimize forward pass logic of PyTorch QAT 2.0 by @quic-geunlee
  • Fix functional depthwise conv support on model preparer by @quic-kyuykim
  • Fix bug in model validator to correctly identify functional ops in leaf module by @quic-klhsieh
  • Support dynamic functional conv2d in model preparer by @quic-kyuykim
  • Added updated default runtime config, also a per-channel one. Fixed n… by @quic-akhobare
  • Include residing module info in model validator by @quic-klhsieh
  • Support for Keras MultiHeadAttention Layer by @quic-ashvkuma

Documentation

version 1.22.0

04 Jul 22:10
Compare
Choose a tag to compare

This release has the following changes

  • Support for simulation and QAT for PyTorch transformer models (including support for torch.nn mha and encoder layers)

Documentation

version 1.21.0

03 Jun 04:28
a433425
Compare
Choose a tag to compare

This release has the following changes

  • New feature: PyTorch QuantAnalyzer - Visualize per-layer sensitivity and per-quantizer PDF histograms
  • New feature: TensorFlow AutoQuant - Automatically apply various AIMET post-training quantization techniques
  • PyTorch QAT with Range Learning: Added support for Per Channel Quantization
  • PyTorch: Enabled exporting of encodings for multi-output leaf module
  • TensorFlow Adaround
    • Added ability to use configuration file in API to adapt to a specific runtime target
    • Added Per-Channel Quantization support
  • TensorFlow QuantSim: Added support for FP16 inference and QAT
  • TensorFlow Per Channel Quantization
    • Fixed speed and accuracy issues
    • Fixed zero accuracy for 16-bits per channel quantization
    • Added support for DepthWise Conv2d Op
  • Multiple other bug fixes

User guide: https://quic.github.io/aimet-pages/releases/1.21.0/user_guide/index.html
API documentation: https://quic.github.io/aimet-pages/releases/1.21.0/api_docs/index.html
Documentation main page: https://quic.github.io/aimet-pages/index.html