This document list steps of reproducing Intel Optimized PyTorch RNNT models tuning results via Neural Compressor.
Our example comes from MLPerf Inference Benchmark Suite
Recommend python 3.6 or higher version.
cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
pip install -r requirements.txt
Check your gcc version with command : gcc -v
GCC5 or above is needed.
bash prepare_loadgen.sh
cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
bash prepare_dataset.sh --download_dir=origin_dataset --convert_dir=convert_dataset
Prepare_dataset.sh contains two stages:
- stage1: download LibriSpeech/dev-clean dataset and extract it.
- stage2: convert .flac file to .wav file
cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
wget https://zenodo.org/record/3662521/files/DistributedDataParallel_1576581068.9962234-epoch-100.pt?download=1 -O rnnt.pt
The changes made are as follows:
- add conf.yaml: This file contains the configuration of quantization.
- run.py->run_tune.py: we added neural_compressor support in it.
- edit pytorch_SUT.py: remove jit script convertion
- edit pytorch/decoders.py: remove assertion of torch.jit.ScriptModule
bash run_tuning.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --output_model=saved_results
bash run_benchmark.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --mode=benchmark/accuracy --int8=true/false
Left part is accuracy/percentage, right part is time_usage/second.
- FP32 baseline is: [92.5477, 796.7552].
- Tune 1 result is: [91.5872, 1202.2529]
- Tune 2 result is: [91.5894, 1201.3231]
- Tune 3 result is: [91.5195, 1211.5965]
- Tune 4 result is: [91.6030, 1218.2211]
- Tune 5 result is: [91.4812, 1169.5080]
- ...