Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization User Guide edits #3348

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def setup(app):
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
91 changes: 44 additions & 47 deletions Docs/user_guide/adaround.rst
Original file line number Diff line number Diff line change
@@ -1,84 +1,81 @@
.. _ug-adaround:


=====================
##############
AIMET AdaRound
=====================
##############

AIMET quantization features, by default, use the "nearest rounding" technique for achieving quantization.
In the following figure, a single weight value in a weight tensor is shown as an illustrative example. When using the
"nearest rounding" technique, this weight value is quantized to the nearest integer value. The Adaptive Rounding
(AdaRound) feature, uses a smaller subset of the unlabelled training data to adaptively round the weights of modules
with weights. In the following figure, the weight value is quantized to the integer value far from it. AdaRound,
optimizes a loss function using the unlabelled training data to adaptively decide whether to quantize a specific
weight to the integer value near it or away from it. Using the AdaRound quantization, a model is able to achieve an
accuracy closer to the FP32 model, while using low bit-width integer quantization.

When creating a QuantizationSimModel using the AdaRounded model, use the QuantizationSimModel provided API for
setting and freezing parameter encodings before computing the encodings. Please refer the code example in the AdaRound
API section.
By default, AIMET uses *nearest rounding* for quantization. A single weight value in a weight tensor is illustrated in the following figure. In nearest rounding, this weight value is quantized to the nearest integer value.

The Adaptive Rounding (AdaRound) feature uses a subset of the unlabeled training data to adaptively round weights. In the following figure, the weight value is quantized to the integer value far from it.

.. image:: ../images/adaround.png
:width: 900px

AdaRound Use Cases
=====================
AdaRound optimizes a loss function using the unlabelled training data to decide whether to quantize a weight to the closer or further integer value. AdaRound quantization achieves accuracy closer to the FP32 model, while using low bit-width integer quantization.

When creating a QuantizationSimModel using AdaRounded, use the QuantizationSimModel provided in the API to set and freeze parameter encodings before computing the encodings. Refer the code example in the AdaRound API.

AdaRound use cases
==================

**Terminology**

Common terminology
=====================
* BC - Bias Correction
* BNF - Batch Norm Folding
* CLE - Cross Layer Equalization
* HBF - High Bias Folding
* QAT - Quantization Aware Training
* { } - An optional step in the use case
The following abbreviations are used in the following use case descriptions:

BC
Bias Correction
BNF
Batch Norm Folding
CLE
Cross Layer Equalization
HBF
High Bias Folding
QAT
Quantization Aware Training
{ }
An optional step in the use case

Use Cases
=====================
**Recommended**

The following sequences are recommended:

#. {BNF} --> {CLE} --> AdaRound
Applying BNF and CLE are optional steps before applying AdaRound. Some models benefit from applying CLE
while some don't get any benefit.
Applying BNF and CLE are optional steps before applying AdaRound. Some models benefit from applying CLE while some don't.

#. AdaRound --> QAT
AdaRound is a post-training quantization feature. But, for some models applying BNF and CLE may not be beneficial.
For these models, QAT after AdaRound may be beneficial. AdaRound is considered as a better weights initialization
step which helps for faster QAT.
AdaRound is a post-training quantization feature, but for some models applying BNF and CLE may not help. For these models, applying AdaRound before QAT might help. AdaRound is a better weights initialization step that speeds up QAT.

**Not recommended**

Not recommended
=====================
Applying BC either before or after AdaRound is not recommended.
Applying bias correction (BC) either before or after AdaRound is *not* recommended.

#. AdaRound --> BC

#. BC --> AdaRound


AdaRound Hyper parameters guidelines
AdaRound hyper parameters guidelines
=====================================

There are couple of hyper parameters required during AdaRound optimization and are exposed to users. But some of them
are with their default values which lead to good and stable results over many models and not recommended to change often.

Following is guideline for Hyper parameters:

#. Hyper Parameters to be changed often: number of batches (approximately 500-1000 images, if batch size of data loader
is 64, then 16 number of batches leads to 1024 images), number of iterations(default 10000)
A number of hyper parameters used during AdaRound optimization are exposed to users. The default values of some of these parameters lead to stable, good results over many models; we recommend that you not change these.

#. Hyper Parameters to be changed moderately: regularization parameter (default 0.01)
Use the following guideline for adjusting hyper parameters with AdaRound.

#. Hyper Parameters to be changed least: beta range(default (20, 2)), warm start period (default 20%)
* Hyper Parameters to be changed often
* Number of batches (approximately 500-1000 images. If batch size of data loader is 64, then 16x the number of batches leads to 1024 images)
* Number of iterations(default 10000)

|
* Hyper Parameters to change with caution
* Regularization parameter (default 0.01)

* Hyper Parameters to avoid changing
* Beta range (default (20, 2))
* Warm start period (default 20%)

AdaRound API
============

Please refer to the links below to view the AdaRound API for each AIMET variant:
See the AdaRound API variant for your platform:

- :ref:`AdaRound for PyTorch<api-torch-adaround>`
- :ref:`AdaRound for Keras<api-keras-adaround>`
Expand Down
43 changes: 21 additions & 22 deletions Docs/user_guide/auto_quant.rst
Original file line number Diff line number Diff line change
@@ -1,48 +1,47 @@
.. _ug-auto-quant:


===============
###############
AIMET AutoQuant
===============
###############

Overview
========
AIMET offers a suite of neural network post-training quantization techniques. Often, applying these techniques in a
specific sequence, results in better accuracy and performance. Without the AutoQuant feature, the AIMET
user needs to manually try out various combinations of AIMET quantization features. This manual process is
error-prone and often time-consuming.

The AutoQuant feature, analyzes the model, determines the sequence of AIMET quantization techniques and applies these
techniques. In addition, the user can specify the amount of accuracy drop that can be tolerated, in the AutoQuant API.
As soon as this threshold accuracy is reached, AutoQuant stops applying any additional quantization technique. In
summary, the AutoQuant feature saves time and automates the quantization of the neural networks.
AIMET offers a suite of neural network post-training quantization techniques. Often, applying these techniques in a specific sequence results in better accuracy and performance.

The AutoQuant feature analyzes the model, determines the best sequence of AIMET quantization techniques, and applies these techniques. You can specify the accuracy drop that can be tolerated in the AutoQuant API.
As soon as this threshold accuracy is reached, AutoQuant stops applying quantization techniques.

Without the AutoQuant feature, you must manually try combinations of AIMET quantization techniques. This manual process is error-prone and time-consuming.

Workflow
========

Before entering the optimization workflow, AutoQuant performs the following preparation steps:
The workflow looks like this:

1) Check the validity of the model and convert it into an AIMET quantization-friendly format (denoted as `Prepare Model` below).
2) Select the best-performing quantization scheme for the given model (denoted as `QuantScheme Selection` below)

After the prepration steps, AutoQuant mainly consists of the following three stages:
.. image:: ../images/auto_quant_v2_flowchart.png

1) BatchNorm folding
2) :ref:`Cross-Layer Equalization <ug-post-training-quantization>`
3) :ref:`AdaRound <ug-adaround>`

These techniques are applied in a best-effort manner until the model meets the allowed accuracy drop.
If applying AutoQuant fails to satisfy the evaluation goal, AutoQuant will return the model to which the best combination
of the above techniques is applied.
Before entering the optimization workflow, AutoQuant prepares by:

.. image:: ../images/auto_quant_v2_flowchart.png
1. Checking the validity of the model and converting the model into an AIMET quantization-friendly format (`Prepare Model`).
2. Selecting the best-performing quantization scheme for the given model (`QuantScheme Selection`)

After the prepration steps, AutoQuant proceeds to try three techniques:

1. BatchNorm folding
2. :ref:`Cross-Layer Equalization (CLE) <ug-post-training-quantization>`
3. :ref:`AdaRound <ug-adaround>`

These techniques are applied in a best-effort manner until the model meets the allowed accuracy drop.
If applying AutoQuant fails to satisfy the evaluation goal, AutoQuant returns the model that returned the best results.

AutoQuant API
=============

Please refer to the links below to view the AutoQuant API for each AIMET variant:
See the AutoQuant API for your AIMET variant:

- :ref:`AutoQuant for PyTorch<api-torch-auto-quant>`
- :ref:`AutoQuant for ONNX<api-onnx-auto-quant>`
Expand Down
31 changes: 13 additions & 18 deletions Docs/user_guide/bn_reestimation.rst
Original file line number Diff line number Diff line change
@@ -1,50 +1,45 @@
.. _ug-bn-reestimation:


======================
AIMET BN Re-estimation
======================
##############################
AIMET Batch Norm Re-estimation
##############################

Overview
========

The BN Re-estimation feature utilizes a small subset of training data to individually re-estimate the statistics of the
Batch Normalization (BN) layers in a model. These BN statistics are then used to adjust the quantization scale parameters
of the preceeding Convolution or Linear layers. Effectively, the BN layers are folded.
The Batch Norm (BN) re-estimation feature utilizes a small subset of training data to individually re-estimate the statistics of the BN layers in a model. These BN statistics are then used to adjust the quantization scale parameters of the preceeding Convolution or Linear layers. Effectively, the BN layers are folded.

The BN Re-estimation feature is applied after performing Quantization Aware Training (QAT) with Range Learning, with
Per Channel Quantization (PCQ) enabled. It is very important NOT to fold the BN layers before performing QAT. The BN layers are
folded ONLY after QAT and the re-estimation of the BN statistics are completed. The Workflow section below, covers
the exact sequence of steps.
The BN re-estimation feature is applied after performing Quantization Aware Training (QAT) with Range Learning, with Per Channel Quantization (PCQ) enabled. It is important *not* to fold the BN layers before performing QAT. Fold the BN layers only after QAT and the re-estimation of the BN statistics are completed. See the Workflow section below for the exact sequence of steps.

The BN Re-estimation feature is specifically recommended for the following scenarios:
The BN re-estimation feature is specifically recommended for the following scenarios:

- Low-bitwidth weight quantization (e.g., 4-bits)
- Models for which Batch Norm Folding leads to decreased performance.
- Models for which Batch Norm Folding leads to decreased performance
- Models where the main issue is weight quantization (including higher bitwidth quantization)
- Low bitwidth quantization of depthwise separable layers since their Batch Norm Statistics are affected by oscillations


Workflow
========

BN-Re-estimation requires that
BN re-estimation requires that:

1. BN layers not be folded before QAT.
2. Per Channel Quantization is enabled.

To use the BN-Re-estimation feature, the following sequence of steps must be followed in the correct order.
To use the BN re-estimation feature, the following sequence of steps must be followed in order:

1. Create the QuantizationSimModel object with Range Learning Quant Scheme
2. Perform QAT with Range Learning
3. Re-estimate the BN statistics
4. Fold the BN layers
5. Using the QuantizationSimModel, export the model and encodings.

Once the above steps are completed, the model can be run on the target for inference.
Once the steps are completed, the model can be run on the target for inference.

The following high level call flow diagrams, enumerates the work flow for PyTorch.
The workflow is the same for TensorFlow and Keras.
The following sequence diagram shows the workflow for PyTorch.
The workflow is the same for TensorFlow and Keras.

.. image:: ../images/bn_reestimation.png
:width: 1200px
Expand All @@ -53,7 +48,7 @@ The workflow is the same for TensorFlow and Keras.
BN Re-estimation API
====================

Please refer to the links below to view the BN Re-estimation API for each AIMET variant:
See the links below to view the BN re-estimation API for each AIMET variant:

- :ref:`BN Re-estimation for PyTorch<api-torch-bn-reestimation>`
- :ref:`BN Re-estimation for Keras<api-keras-bn-reestimation>`
Expand Down
Loading