Intel® Neural Compressor validated examples with multiple compression techniques, including quantization, pruning, knowledge distillation and orchestration. Part of the validated cases can be found in the example tables, and the release data is available here.
Note: The example marked with
*
means it still use 1.x API.
-
Quick Get Started Notebook of Intel® Neural Compressor for ONNXRuntime
-
Quick Get Started Notebook of Intel® Neural Compressor for Tensorflow
- tf_example1: quantize with built-in dataloader and metric.
- tf_example2: quantize keras model with customized metric and dataloader.
- tf_example3: convert model with mix precision.
- tf_example4: quantize checkpoint with dummy dataloader.
- tf_example5: config performance and accuracy measurement.
- tf_example6: use default user-facing APIs to quantize a pb model.
- tf_example7: quantize and benchmark with pure python API.
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet50 V1.0 | Image Recognition | Post-Training Static Quantization | pb |
ResNet50 V1.5 | Image Recognition | Post-Training Static Quantization | pb / keras |
ResNet101 | Image Recognition | Post-Training Static Quantization | pb / keras |
MobileNet V1 | Image Recognition | Post-Training Static Quantization | pb |
MobileNet V2 | Image Recognition | Post-Training Static Quantization | pb / keras |
MobileNet V3 | Image Recognition | Post-Training Static Quantization | pb |
Inception V1 | Image Recognition | Post-Training Static Quantization | pb |
Inception V2 | Image Recognition | Post-Training Static Quantization | pb |
Inception V3 | Image Recognition | Post-Training Static Quantization | pb / keras |
Inception V4 | Image Recognition | Post-Training Static Quantization | pb |
Inception ResNet V2 | Image Recognition | Post-Training Static Quantization | pb / keras |
VGG16 | Image Recognition | Post-Training Static Quantization | pb / keras |
VGG19 | Image Recognition | Post-Training Static Quantization | pb / keras |
ResNet V2 50 | Image Recognition | Post-Training Static Quantization | pb / keras |
ResNet V2 101 | Image Recognition | Post-Training Static Quantization | pb / keras |
ResNet V2 152 | Image Recognition | Post-Training Static Quantization | pb |
DenseNet121 | Image Recognition | Post-Training Static Quantization | pb |
DenseNet161 | Image Recognition | Post-Training Static Quantization | pb |
DenseNet169 | Image Recognition | Post-Training Static Quantization | pb |
EfficientNet B0 | Image Recognition | Post-Training Static Quantization | ckpt |
Xception | Image Recognition | Post-Training Static Quantization | keras |
ResNet V2 | Image Recognition | Quantization-Aware Training | keras |
EfficientNet V2 B0 | Image Recognition | Post-Training Static Quantization | SavedModel |
BERT base MRPC | Natural Language Processing | Post-Training Static Quantization | ckpt |
BERT large SQuAD (Model Zoo) | Natural Language Processing | Post-Training Static Quantization | pb |
BERT large SQuAD | Natural Language Processing | Post-Training Static Quantization | pb |
DistilBERT base | Natural Language Processing | Post-Training Static Quantization | pb |
Transformer LT | Natural Language Processing | Post-Training Static Quantization | pb |
Transformer LT MLPerf | Natural Language Processing | Post-Training Static Quantization | pb |
SSD ResNet50 V1 | Object Detection | Post-Training Static Quantization | pb / ckpt |
SSD MobileNet V1 | Object Detection | Post-Training Static Quantization | pb / ckpt |
Faster R-CNN Inception ResNet V2 | Object Detection | Post-Training Static Quantization | pb / SavedModel |
Faster R-CNN ResNet101 | Object Detection | Post-Training Static Quantization | pb / SavedModel |
Faster R-CNN ResNet50 | Object Detection | Post-Training Static Quantization | pb |
Mask R-CNN Inception V2 | Object Detection | Post-Training Static Quantization | pb / ckpt |
SSD ResNet34 | Object Detection | Post-Training Static Quantization | pb |
YOLOv3 | Object Detection | Post-Training Static Quantization | pb |
Wide & Deep | Recommendation | Post-Training Static Quantization | pb |
Arbitrary Style Transfer | Style Transfer | Post-Training Static Quantization | ckpt |
OPT | Natural Language Processing | Post-Training Static Quantization | pb (smooth quant) |
GPT2 | Natural Language Processing | Post-Training Static Quantization | pb (smooth quant) |
ViT | Image Recognition | Post-Training Static Quantization | pb |
GraphSage | Graph Networks | Post-Training Static Quantization | pb |
Student Model | Teacher Model | Domain | Approach | Examples |
---|---|---|---|---|
MobileNet | DenseNet201 | Image Recognition | Knowledge Distillation | pb |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet V2 | Image Recognition | Structured (4x1, 2in4) | keras |
ViT | Image Recognition | Structured (4x1, 2in4) | keras |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet50 V1 | Image Recognition | TF2ONNX | int8 fp32 |
ResNet50 V1.5 | Image Recognition | TF2ONNX | int8 fp32 |
MobileNet V2 | Image Recognition | TF2ONNX | int8 fp32 |
VGG16 | Image Recognition | TF2ONNX | int8 fp32 |
Faster R-CNN ResNet50 | Object Detection | TF2ONNX | int8 fp32 |
SSD MobileNet V1 | Object Detection | TF2ONNX | int8 fp32 |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet18 | Image Recognition | Post-Training Static Quantization | fx / ipex |
ResNet18 | Image Recognition | Quantization-Aware Training | fx |
ResNet50 | Image Recognition | Post-Training Static Quantization | fx / ipex |
ResNet50 | Image Recognition | Quantization-Aware Training | fx |
ResNeXt101_32x16d_wsl | Image Recognition | Post-Training Static Quantization | ipex |
ResNeXt101_32x8d | Image Recognition | Post-Training Static Quantization | fx |
Se_ResNeXt50_32x4d | Image Recognition | Post-Training Static Quantization | fx |
Inception V3 | Image Recognition | Post-Training Static Quantization | fx |
MobileNet V2 | Image Recognition | Post-Training Static Quantization | fx |
PeleeNet | Image Recognition | Post-Training Static Quantization | fx |
ResNeSt50 | Image Recognition | Post-Training Static Quantization | fx |
3D-UNet | Image Recognition | Post-Training Static Quantization | fx |
SSD ResNet34 | Object Detection | Post-Training Static Quantization | fx / ipex |
YOLOv3 | Object Detection | Post-Training Static Quantization | fx |
Mask R-CNN | Object Detection | Post-Training Static Quantization | fx |
DLRM | Recommendation | Post-Training Static Quantization | ipex / fx |
HuBERT | Speech Recognition | Post-Training Static Quantization | fx |
HuBERT | Speech Recognition | Post-Training Dynamic Quantization | fx |
RNNT | Speech Recognition | Post-Training Dynamic Quantization | fx |
BlendCNN | Natural Language Processing | Post-Training Static Quantization | ipex |
bert-large-uncased-whole-word-masking-finetuned-squad | Natural Language Processing | Post-Training Static Quantization | fx / ipex |
distilbert-base-uncased-distilled-squad | Natural Language Processing | Post-Training Static Quantization | ipex |
yoshitomo-matsubara/bert-large-uncased-rte | Natural Language Processing | Post-Training Dynamic Quantization | fx |
Intel/xlm-roberta-base-mrpc | Natural Language Processing | Post-Training Dynamic Quantization | fx |
textattack/distilbert-base-uncased-MRPC | Natural Language Processing | Post-Training Dynamic Quantization | fx |
textattack/albert-base-v2-MRPC | Natural Language Processing | Post-Training Dynamic Quantization | fx |
Intel/xlm-roberta-base-mrpc | Natural Language Processing | Post-Training Static Quantization | fx |
yoshitomo-matsubara/bert-large-uncased-rte | Natural Language Processing | Post-Training Static Quantization | fx |
Intel/bert-base-uncased-mrpc | Natural Language Processing | Post-Training Static Quantization | fx |
textattack/bert-base-uncased-CoLA | Natural Language Processing | Post-Training Static Quantization | fx |
textattack/bert-base-uncased-STS-B | Natural Language Processing | Post-Training Static Quantization | fx |
gchhablani/bert-base-cased-finetuned-sst2 | Natural Language Processing | Post-Training Static Quantization | fx |
ModelTC/bert-base-uncased-rte | Natural Language Processing | Post-Training Static Quantization | fx |
textattack/bert-base-uncased-QNLI | Natural Language Processing | Post-Training Static Quantization | fx |
yoshitomo-matsubara/bert-large-uncased-cola | Natural Language Processing | Post-Training Static Quantization | fx |
textattack/distilbert-base-uncased-MRPC | Natural Language Processing | Post-Training Static Quantization | fx |
Intel/xlnet-base-cased-mrpc | Natural Language Processing | Post-Training Static Quantization | fx |
textattack/roberta-base-MRPC | Natural Language Processing | Post-Training Static Quantization | fx |
Intel/camembert-base-mrpc | Natural Language Processing | Post-Training Static Quantization | fx |
t5-small | Natural Language Processing | Post-Training Dynamic Quantization | fx |
Helsinki-NLP/opus-mt-en-ro | Natural Language Processing | Post-Training Dynamic Quantization | fx |
lvwerra/pegasus-samsum | Natural Language Processing | Post-Training Dynamic Quantization | fx |
google/reformer-crime-and-punishment | Natural Language Processing | Post-Training Static Quantization | fx |
EleutherAI/gpt-j-6B | Natural Language Processing | Post-Training Static Quantization | fx / smooth quant |
EleutherAI/gpt-j-6B | Natural Language Processing | Post-Training Weight Only Quantization | weight_only |
abeja/gpt-neox-japanese-2.7b | Natural Language Processing | Post-Training Static Quantization | fx |
bigscience/bloom | Natural Language Processing | Post-Training Static Quantization | smooth quant |
facebook/opt | Natural Language Processing | Post-Training Static Quantization | smooth quant |
SD Diffusion | Text to Image | Post-Training Static Quantization | fx |
openai/whisper-large | Speech Recognition | Post-Training Dynamic Quantization | fx |
torchaudio/wav2vec2 | Speech Recognition | Post-Training Dynamic Quantization | fx |
Quantization with Intel® Extension for Transformers based on Intel® Neural Compressor
Model | Domain | Approach | Examples |
---|---|---|---|
T5 Large | Natural Language Processing | Post-Training Dynamic Quantization | fx |
Flan T5 Large | Natural Language Processing | Post-Training Dynamic / Static Quantization | fx |
Model | Domain | Pruning Type | Approach | Examples |
---|---|---|---|---|
Distilbert-base-uncased | Natural Language Processing (text classification) | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
Bert-mini | Natural Language Processing (text classification) | Structured (4x1, 2in4, per channel), Unstructured | Snip-momentum | eager |
Distilbert-base-uncased | Natural Language Processing (question answering) | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
Bert-mini | Natural Language Processing (question answering) | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
Bert-base-uncased | Natural Language Processing (question answering) | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
Bert-large | Natural Language Processing (question answering) | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
Flan-T5-small | Natural Language Processing (translation) | Structured (4x1) | Snip-momentum | eager |
YOLOv5s6 | Object Detection | Structured (4x1, 2in4), Unstructured | Snip-momentum | eager |
ResNet50 | Image Recognition | Structured (2x1) | Snip-momentum | eager |
Bert-base | Question Answering | Structured (channel, multi-head attention) | Snip-momentum | eager |
Bert-large | Question Answering | Structured (channel, multi-head attention) | Snip-momentum | eager |
Student Model | Teacher Model | Domain | Approach | Examples |
---|---|---|---|---|
CNN-2 | CNN-10 | Image Recognition | Knowledge Distillation | eager |
MobileNet V2-0.35 | WideResNet40-2 | Image Recognition | Knowledge Distillation | eager |
ResNet18|ResNet34|ResNet50|ResNet101 | ResNet18|ResNet34|ResNet50|ResNet101 | Image Recognition | Knowledge Distillation | eager |
ResNet18|ResNet34|ResNet50|ResNet101 | ResNet18|ResNet34|ResNet50|ResNet101 | Image Recognition | Self Distillation | eager |
VGG-8 | VGG-13 | Image Recognition | Knowledge Distillation | eager |
BlendCNN | BERT-Base | Natural Language Processing | Knowledge Distillation | eager |
DistilBERT | BERT-Base | Natural Language Processing | Knowledge Distillation | eager |
BiLSTM | RoBERTa-Base | Natural Language Processing | Knowledge Distillation | eager |
TinyBERT | BERT-Base | Natural Language Processing | Knowledge Distillation | eager |
BERT-3 | BERT-Base | Natural Language Processing | Knowledge Distillation | eager |
DistilRoBERTa | RoBERTa-Large | Natural Language Processing | Knowledge Distillation | eager |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet50 | Image Recognition | Multi-shot: Pruning and PTQ |
link |
ResNet50 | Image Recognition | One-shot: QAT during Pruning |
link |
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa | Natural Language Processing (question-answering) | One-shot: Pruning, Distillation and QAT |
link |
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa | Natural Language Processing (text-classification) | One-shot: Pruning, Distillation and QAT |
link |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet18 | Image Recognition | PT2ONNX | int8 fp32 |
ResNet50 | Image Recognition | PT2ONNX | int8 fp32 |
bert base MRPC | Natural Language Processing | PT2ONNX | int8 fp32 |
bert large MRPC | Natural Language Processing | PT2ONNX | int8 fp32 |
Model | Domain | Approach | Examples |
---|---|---|---|
ResNet50 V1.5 | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
ResNet50 V1.5 MLPerf | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
VGG16 | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
MobileNet V2 | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
MobileNet V3 MLPerf | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
AlexNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
CaffeNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
DenseNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops |
EfficientNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
FCN (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
GoogleNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
Inception V1 (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
MNIST (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops |
MobileNet V2 (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
ResNet50 V1.5 (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
ShuffleNet V2 (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
SqueezeNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
VGG16 (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
ZFNet (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops / qdq |
ArcFace (ONNX Model Zoo) | Image Recognition | Post-Training Static Quantization | qlinearops |
CodeBert | Natural Language Processing | Post-Training Static Quantization | qlinearops |
CodeBert | Natural Language Processing | Post-Training Dynamic Quantization | integerops |
BERT base MRPC | Natural Language Processing | Post-Training Static Quantization | integerops / qdq |
BERT base MRPC | Natural Language Processing | Post-Training Dynamic Quantization | integerops |
DistilBERT base MRPC | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qdq |
Mobile bert MRPC | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qdq |
Roberta base MRPC | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qdq |
BERT SQuAD (ONNX Model Zoo) | Natural Language Processing | Post-Training Dynamic Quantization | integerops |
GPT2 lm head WikiText (ONNX Model Zoo) | Natural Language Processing | Post-Training Dynamic Quantization | integerops |
MobileBERT SQuAD MLPerf (ONNX Model Zoo) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qdq |
BiDAF (ONNX Model Zoo) | Natural Language Processing | Post-Training Dynamic Quantization | integerops |
BERT base uncased MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Roberta base MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
XLM Roberta base MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Camembert base MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
MiniLM L12 H384 uncased MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
DistilBERT base uncased SST-2 (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Albert base v2 SST-2 (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
BERT base cased MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Electra small discriminator MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
BERT mini MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Xlnet base cased MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
BART large MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
DeBERTa v3 base MRPC (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Spanbert SQuAD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Bert base multilingual cased SQuAD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
DistilBert base uncased SQuAD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
BERT large uncased whole word masking SQuAD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Roberta large SQuAD v2 (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
GPT2 WikiText (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
DistilGPT2 WikiText (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
LayoutLMv3 FUNSD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
LayoutLMv2 FUNSD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
LayoutLM FUNSD (HuggingFace) | Natural Language Processing | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
SSD MobileNet V1 | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
SSD MobileNet V2 | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
Table Transformer | Object Detection | Post-Training Static Quantization | qlinearops |
SSD MobileNet V1 (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
DUC (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops |
Faster R-CNN (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
Mask R-CNN (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
SSD (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops / qdq |
Tiny YOLOv3 (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops |
YOLOv3 (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops |
YOLOv4 (ONNX Model Zoo) | Object Detection | Post-Training Static Quantization | qlinearops |
Emotion FERPlus (ONNX Model Zoo) | Body Analysis | Post-Training Static Quantization | qlinearops |
Ultra Face (ONNX Model Zoo) | Body Analysis | Post-Training Static Quantization | qlinearops |
GPT-J-6B (HuggingFace) | Text Generation | Post-Training Dynamic / Static Quantization | integerops / qlinearops |
Llama-7B (HuggingFace) | Text Generation | Static / Weight Only Quantization | qlinearops / weight_only |
- *BERT Mini SST2 performance boost with INC: train a BERT-Mini model on SST-2 dataset through distillation, and leverage quantization to accelerate the inference while maintaining the accuracy using Intel® Neural Compressor.
- Performance of FP32 Vs. INT8 ResNet50 Model: compare existed FP32 & INT8 ResNet50 model directly.
- Intel® Neural Compressor Sample for PyTorch*: an End-To-End pipeline to build up a CNN model by PyTorch to recognize fashion image and speed up AI model by Intel® Neural Compressor.
- Intel® Neural Compressor Sample for TensorFlow*: an End-To-End pipeline to build up a CNN model by TensorFlow to recognize handwriting number and speed up AI model by Intel® Neural Compressor.
- Accelerate VGG19 Inference on Intel® Gen4 Xeon® Sapphire Rapids: an End-To-End pipeline to train VGG19 model by transfer learning based on pre-trained model from TensorFlow Hub; quantize it by Intel® Neural Compressor on Intel® Gen4 Xeon® Sapphire Rapids.