quic · dwelsch-esi · Sep 15, 2024 · Sep 16, 2024 · Sep 17, 2024 · Sep 26, 2024
diff --git a/Docs/conf.py b/Docs/conf.py
@@ -112,7 +112,7 @@ def setup(app):
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = None
+language = 'en'
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.

diff --git a/Docs/user_guide/adaround.rst b/Docs/user_guide/adaround.rst
@@ -1,84 +1,81 @@
 .. _ug-adaround:
 
 
-=====================
+##############
 AIMET AdaRound
-=====================
+##############
 
- AIMET quantization features, by default, use the "nearest rounding" technique for achieving quantization.
- In the following figure, a single weight value in a weight tensor is shown as an illustrative example. When using the
- "nearest rounding" technique, this weight value is quantized to the nearest integer value. The Adaptive Rounding
- (AdaRound) feature, uses a smaller subset of the unlabelled training data to adaptively round the weights of modules
- with weights. In the following figure, the weight value is quantized to the integer value far from it. AdaRound,
- optimizes a loss function using the unlabelled training data to adaptively decide whether to quantize a specific
- weight to the integer value near it or away from it. Using the AdaRound quantization, a model is able to achieve an
- accuracy closer to the FP32 model, while using low bit-width integer quantization.
-
- When creating a QuantizationSimModel using the AdaRounded model, use the QuantizationSimModel provided API for
- setting and freezing parameter encodings before computing the encodings. Please refer the code example in the AdaRound
- API section.
+By default, AIMET uses *nearest rounding* for quantization. A single weight value in a weight tensor is illustrated in the following figure. In nearest rounding, this weight value is quantized to the nearest integer value.
 
+The Adaptive Rounding (AdaRound) feature uses a subset of the unlabeled training data to adaptively round weights. In the following figure, the weight value is quantized to the integer value far from it.
 
 .. image:: ../images/adaround.png
     :width: 900px
 
-AdaRound Use Cases
-=====================
+AdaRound optimizes a loss function using the unlabelled training data to decide whether to quantize a weight to the closer or further integer value. AdaRound quantization acieves accuracy closer to the FP32 model using low bit-width integer quantization.
+
+When creating a QuantizationSimModel using AdaRounded, use the QuantizationSimModel provided in the API to set and freeze parameter encodings before computing the encodings. Refer the code example in the AdaRound API.
+
+AdaRound use cases
+==================
+
+**Terminology**
 
-Common terminology
-=====================
-	* BC - Bias Correction
-	* BNF - Batch Norm Folding
-	* CLE - Cross Layer Equalization
-	* HBF -  High Bias Folding
-	* QAT - Quantization Aware Training
-	* { } - An optional step in the use case
+The following abbreviations are used in the following use case descriptions:
 
+BC
+ Bias Correction
+BNF
+ Batch Norm Folding
+CLE
+ Cross Layer Equalization
+HBF
+  High Bias Folding
+QAT
+ Quantization Aware Training
+{ }
+ An optional step in the use case
 
-Use Cases
-=====================
+**Recommended**
+
+The following sequences are recommended:
 
  #. {BNF} --> {CLE} --> AdaRound
-       Applying BNF and CLE are optional steps before applying AdaRound. Some models benefit from applying CLE
-       while some don't get any benefit.
+       Applying BNF and CLE are optional steps before applying AdaRound. Some models benefit from applying CLE while some don't.
 
  #. AdaRound --> QAT
-       AdaRound is a post-training quantization feature. But, for some models applying BNF and CLE may not be beneficial.
-       For these models, QAT after AdaRound may be beneficial. AdaRound is considered as a better weights initialization
-       step which helps for faster QAT.
+       AdaRound is a post-training quantization feature, but for some models applying BNF and CLE may not help. For these models, applying AdaRound before QAT might help. AdaRound is a better weights initialization step that speeds up QAT.
 
+**Not recommended**
 
- Not recommended
-=====================
-Applying BC either before or after AdaRound is not recommended.
+Applying bias correction (BC) either before or after AdaRound is *not* recommended.
 
  #. AdaRound --> BC
 
  #. BC --> AdaRound
 
-
- AdaRound Hyper parameters guidelines
+AdaRound hyper parameters guidelines
 =====================================
 
-There are couple of hyper parameters required during AdaRound optimization and are exposed to users. But some of them
-are with their default values which lead to good and stable results over many models and not recommended to change often.
-
-Following is guideline for Hyper parameters:
-
-#. Hyper Parameters to be changed often: number of batches (approximately 500-1000 images, if batch size of data loader
-   is 64, then 16 number of batches leads to 1024 images), number of iterations(default 10000)
+A number of hyper parameters used during AdaRound optimization are exposed to users. The default values of some of these parameters lead to stable, good results over many models; we recommend that you not change these.
 
-#. Hyper Parameters to be changed moderately: regularization parameter (default 0.01)
+Use the following guideline for adjusting hyper parameters with AdaRound.
 
-#. Hyper Parameters to be changed least: beta range(default (20, 2)), warm start period (default 20%)
+* Hyper Parameters to be changed often
+    * Number of batches (approximately 500-1000 images. If batch size of data loader is 64, then 16x the    number of batches leads to 1024 images)
+    * Number of iterations(default 10000)
 
-|
+* Hyper Parameters to change with caution
+    * Regularization parameter (default 0.01)
 
+* Hyper Parameters to avoid changing
+    * Beta range (default (20, 2))
+    * Warm start period (default 20%)
 
 AdaRound API
 ============
 
-Please refer to the links below to view the AdaRound API for each AIMET variant:
+See the AdaRound API variant for your platform:
 
 - :ref:`AdaRound for PyTorch<api-torch-adaround>`
 - :ref:`AdaRound for Keras<api-keras-adaround>`

diff --git a/Docs/user_guide/auto_quant.rst b/Docs/user_guide/auto_quant.rst
@@ -1,48 +1,47 @@
 .. _ug-auto-quant:
 
 
-===============
+###############
 AIMET AutoQuant
-===============
+###############
 
 Overview
 ========
-AIMET offers a suite of neural network post-training quantization techniques. Often, applying these techniques in a
-specific sequence, results in better accuracy and performance. Without the AutoQuant feature, the AIMET
-user needs to manually try out various combinations of AIMET quantization features. This manual process is
-error-prone and often time-consuming.
 
-The AutoQuant feature, analyzes the model, determines the sequence of AIMET quantization techniques and applies these
-techniques. In addition, the user can specify the amount of accuracy drop that can be tolerated, in the AutoQuant API.
-As soon as this threshold accuracy is reached, AutoQuant stops applying any additional quantization technique. In
-summary, the AutoQuant feature saves time and automates the quantization of the neural networks.
+AIMET offers a suite of neural network post-training quantization techniques. Often, applying these techniques in a specific sequence results in better accuracy and performance. 
+
+The AutoQuant feature analyzes the model, determines the best sequence of AIMET quantization techniques, and applies these techniques. You can specify the accuracy drop that can be tolerated in the AutoQuant API.
+As soon as this threshold accuracy is reached, AutoQuant stops applying quantization techniques.
+
+Without the AutoQuant feature, you must manually try combinations of AIMET quantization techniques. This manual process is error-prone and time-consuming.
 
 Workflow
 ========
 
-Before entering the optimization workflow, AutoQuant performs the following preparation steps:
+The workflow looks like this:
 
-    1) Check the validity of the model and convert it into an AIMET quantization-friendly format (denoted as `Prepare Model` below).
-    2) Select the best-performing quantization scheme for the given model (denoted as `QuantScheme Selection` below)
 
-After the prepration steps, AutoQuant mainly consists of the following three stages:
+    .. image:: ../images/auto_quant_v2_flowchart.png
 
-    1) BatchNorm folding
-    2) :ref:`Cross-Layer Equalization <ug-post-training-quantization>`
-    3) :ref:`AdaRound <ug-adaround>`
 
-These techniques are applied in a best-effort manner until the model meets the allowed accuracy drop.
-If applying AutoQuant fails to satisfy the evaluation goal, AutoQuant will return the model to which the best combination
-of the above techniques is applied.
+Before entering the optimization workflow, AutoQuant prepares by:
 
-    .. image:: ../images/auto_quant_v2_flowchart.png
+1. Checking the validity of the model and converting the model into an AIMET quantization-friendly format (`Prepare Model`).
+2.  Selecting the best-performing quantization scheme for the given model (`QuantScheme Selection`)
+
+After the prepration steps, AutoQuant proceeds to try three techniques:
 
+1. BatchNorm folding            
+2. :ref:`Cross-Layer Equalization (CLE) <ug-post-training-quantization>`
+3. :ref:`AdaRound <ug-adaround>`
 
+These techniques are applied in a best-effort manner until the model meets the allowed accuracy drop.
+If applying AutoQuant fails to satisfy the evaluation goal, AutoQuant returns the model that returned the best results.
 
 AutoQuant API
 =============
 
-Please refer to the links below to view the AutoQuant API for each AIMET variant:
+See the AutoQuant API for your AIMET variant:
 
 - :ref:`AutoQuant for PyTorch<api-torch-auto-quant>`
 - :ref:`AutoQuant for ONNX<api-onnx-auto-quant>`

diff --git a/Docs/user_guide/bn_reestimation.rst b/Docs/user_guide/bn_reestimation.rst
@@ -2,49 +2,44 @@
 
 
 ======================
-AIMET BN Re-estimation
+AIMET Batch Normal Re-estimation
 ======================
 
 Overview
 ========
 
-The BN Re-estimation feature utilizes a small subset of training data to individually re-estimate the statistics of the
-Batch Normalization (BN) layers in a model. These BN statistics are then used to adjust the quantization scale parameters
-of the preceeding Convolution or Linear layers. Effectively, the BN layers are folded.
+The Batch Normal (BN) re-estimation feature utilizes a small subset of training data to individually re-estimate the statistics of the BN layers in a model. These BN statistics are then used to adjust the quantization scale parameters of the preceeding Convolution or Linear layers. Effectively, the BN layers are folded.
 
-The BN Re-estimation feature is applied after performing Quantization Aware Training (QAT) with Range Learning, with
-Per Channel Quantization (PCQ) enabled. It is very important NOT to fold the BN layers before performing QAT. The BN layers are
-folded ONLY after QAT and the re-estimation of the BN statistics are completed. The Workflow section below, covers
-the exact sequence of steps.
+The BN re-estimation feature is applied after performing Quantization Aware Training (QAT) with Range Learning, with Per Channel Quantization (PCQ) enabled. It is important *not* to fold the BN layers before performing QAT. Fold the BN layers only after QAT and the re-estimation of the BN statistics are completed. See the Workflow section below for the exact sequence of steps.
 
-The BN Re-estimation feature is specifically recommended for the following scenarios:
+The BN re-estimation feature is specifically recommended for the following scenarios:
 
 - Low-bitwidth weight quantization (e.g., 4-bits)
-- Models for which Batch Norm Folding leads to decreased performance.
+- Models for which Batch Norm Folding leads to decreased performance
 - Models where the main issue is weight quantization (including higher bitwidth quantization)
 - Low bitwidth quantization of depthwise separable layers since their Batch Norm Statistics are affected by oscillations
 
 
 Workflow
 ========
 
-BN-Re-estimation requires that
+BN re-estimation requires that:
 
 1. BN layers not be folded before QAT.
 2. Per Channel Quantization is enabled.
 
-To use the BN-Re-estimation feature, the following sequence of steps must be followed in the correct order.
+To use the BN re-estimation feature, the following sequence of steps must be followed in order:
 
 1. Create the QuantizationSimModel object with Range Learning Quant Scheme
 2. Perform QAT with Range Learning
 3. Re-estimate the BN statistics
 4. Fold the BN layers
 5. Using the QuantizationSimModel, export the model and encodings.
 
-Once the above steps are completed, the model can be run on the target for inference.
+Once the steps are completed, the model can be run on the target for inference.
 
-The following high level call flow diagrams, enumerates the work flow for PyTorch.
-The workflow is the same for TensorFlow and  Keras.
+The following sequence diagram shows the workflow for PyTorch.
+The workflow is the same for TensorFlow and Keras.
 
 .. image:: ../images/bn_reestimation.png
     :width: 1200px
@@ -53,7 +48,7 @@ The workflow is the same for TensorFlow and  Keras.
 BN Re-estimation API
 ====================
 
-Please refer to the links below to view the BN Re-estimation API for each AIMET variant:
+See the links below to view the BN re-estimation API for each AIMET variant:
 
 - :ref:`BN Re-estimation for PyTorch<api-torch-bn-reestimation>`
 - :ref:`BN Re-estimation for Keras<api-keras-bn-reestimation>`

diff --git a/Docs/user_guide/index.rst b/Docs/user_guide/index.rst
@@ -2,76 +2,64 @@
    :class: hideitem
 .. _ug-index:
 
-======================================
+######################################
 AI Model Efficiency Toolkit User Guide
-======================================
+######################################
 
 Overview
 ========
 
 AI Model Efficiency Toolkit (AIMET) is a software toolkit that enables users to quantize and compress models.
 Quantization is a must for efficient edge inference using fixed-point AI accelerators.
 
-AIMET optimizes pre-trained models (e.g., FP32 trained models) using post-training and fine-tuning techniques that
-minimize accuracy loss incurred during quantization or compression.
+AIMET optimizes pre-trained models (for example, FP32 trained models) using post-training and fine-tuning techniques that minimize accuracy loss incurred during quantization or compression.
 
-AIMET currently supports PyTorch, TensorFlow, and Keras models.
+AIMET supports PyTorch, TensorFlow, and Keras models.
+
+The following diagram shows a high-level view of the AIMET workflow. 
 
 .. image:: ../images/AIMET_index_no_fine_tune.png
 
-The above picture shows a high-level view of the workflow when using AIMET. The user will start with a trained
-model in either the PyTorch, TensorFlow, or Keras training framework. This trained model is passed to AIMET using APIs
-for compression and quantization. AIMET returns a compressed/quantized version of the model
-that the users can fine-tune (or train further for a small number of epochs) to recover lost accuracy. Users can then
-export via ONNX/meta/h5 to an on-target runtime like Qualcomm\ |reg| Neural Processing SDK.
+You train a model in the PyTorch, TensorFlow, or Keras training framework, then pass the model to AIMET, using its APIs for compression and quantization. AIMET returns a compressed and/or quantized version of the model that you can fine-tune (or train further for a small number of epochs) to recover lost accuracy. You can then export the model using ONNX, meta/checkpoint, or h5 to an on-target runtime like the Qualcomm\ |reg| Neural Processing SDK.
 
 Features
 ========
 
-AIMET supports two sets of model optimization techniques:
-
-- Model Quantization: AIMET can simulate behavior of quantized HW for a given trained
-  model. This model can be optimized using Post-Training Quantization (PTQ) and fine-tuning (Quantization Aware Training
-  - QAT) techniques.
-
-- Model Compression: AIMET supports multiple model compression techniques that allow the
-  user to take a trained model and remove redundancies, resulting in a smaller model that runs faster on target.
+AIMET supports two model optimization techniques:
 
-Release Information
-===================
+Model Quantization
+  AIMET can simulate the behavior of quantized hardware for a trained model. This model can be optimized using Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) fine-tuning techniques.
 
-For information specific to this release, please see :ref:`Release Notes <ug-release-notes>` and :ref:`Known Issues <ug-known-issues>`.
+Model Compression
+  AIMET supports multiple model compression techniques that remove redundancies from a trained model, resulting in a smaller model that runs faster on target.
 
-Installation Guide
-==================
+More Information
+================
 
-Please visit the :ref:`AIMET Installation <ug-installation>` for more details.
-
-Getting Started
-===============
-
-Please refer to the following documentation:
+For more information about AIMET, see the following documentation:
 
+- :ref:`Installation <ug-installation>`
 - :ref:`Quantization User Guide <ug-model-quantization>`
 - :ref:`Compression User Guide <ug-model-compression>`
-- :ref:`API Documentation <ug-apidocs>`
 - :ref:`Examples Documentation <ug-examples>`
-- :ref:`Installation <ug-installation>`
+- :ref:`API Documentation <ug-apidocs>`
+
+Release Information
+===================
+
+For information specific to this release, see :ref:`Release Notes <ug-release-notes>` and :ref:`Known Issues <ug-known-issues>`.
 
 :hideitem:`toc tree`
 ------------------------------------
 .. toctree::
   :hidden:
 
+   Installation <../install/index>
    Quantization User Guide <model_quantization>
    Compression User Guide <model_compression>
    API Documentation<../api_docs/index>
    Examples Documentation <examples>
-   Installation <../install/index>
-
-|
 
-|
 
 | |project| is a product of |author|
 | Qualcomm\ |reg| Neural Processing SDK is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.