Name		Name	Last commit message	Last commit date
parent directory ..
pytorch		pytorch
QSL.py		QSL.py
README.md		README.md
accuracy_eval.py		accuracy_eval.py
conf.yaml		conf.yaml
dev-clean-wav.json		dev-clean-wav.json
environment.yml		environment.yml
mlperf.conf		mlperf.conf
prepare_dataset.sh		prepare_dataset.sh
prepare_loadgen.sh		prepare_loadgen.sh
pytorch_SUT.py		pytorch_SUT.py
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh
run_benchmark.sh		run_benchmark.sh
run_tune.py		run_tune.py
run_tuning.sh		run_tuning.sh
user.conf		user.conf

README.md

Step-by-Step

This document list steps of reproducing Intel Optimized PyTorch RNNT models tuning results via Neural Compressor.

Our example comes from MLPerf Inference Benchmark Suite

Prerequisite

1. Installation

Recommend python 3.6 or higher version.

cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
pip install -r requirements.txt

Check your gcc version with command : gcc -v

GCC5 or above is needed.

bash prepare_loadgen.sh

2. Prepare Dataset

cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
bash prepare_dataset.sh --download_dir=origin_dataset --convert_dir=convert_dataset

Prepare_dataset.sh contains two stages:

stage1: download LibriSpeech/dev-clean dataset and extract it.
stage2: convert .flac file to .wav file

3. Prepare pre-trained model

cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/eager
wget https://zenodo.org/record/3662521/files/DistributedDataParallel_1576581068.9962234-epoch-100.pt?download=1 -O rnnt.pt

Run

1. Enable RNNT example with the auto dynamic quantization strategy of Neural Compressor.

The changes made are as follows:

add conf.yaml: This file contains the configuration of quantization.
run.py->run_tune.py: we added neural_compressor support in it.
edit pytorch_SUT.py: remove jit script convertion
edit pytorch/decoders.py: remove assertion of torch.jit.ScriptModule

2. To get the tuned model and its accuracy:

bash run_tuning.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --output_model=saved_results

3. To get the benchmark of tuned model, includes Batch_size and Throughput:

bash run_benchmark.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --mode=benchmark/accuracy --int8=true/false

4. The following is the brief output information:

Left part is accuracy/percentage, right part is time_usage/second.

FP32 baseline is: [92.5477, 796.7552].
Tune 1 result is: [91.5872, 1202.2529]
Tune 2 result is: [91.5894, 1201.3231]
Tune 3 result is: [91.5195, 1211.5965]
Tune 4 result is: [91.6030, 1218.2211]
Tune 5 result is: [91.4812, 1169.5080]
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eager

eager

README.md

Step-by-Step

Prerequisite

1. Installation

2. Prepare Dataset

3. Prepare pre-trained model

Run

1. Enable RNNT example with the auto dynamic quantization strategy of Neural Compressor.

2. To get the tuned model and its accuracy:

3. To get the benchmark of tuned model, includes Batch_size and Throughput:

4. The following is the brief output information:

Files

eager

Directory actions

More options

Directory actions

More options

Latest commit

History

eager

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

1. Installation

2. Prepare Dataset

3. Prepare pre-trained model

Run

1. Enable RNNT example with the auto dynamic quantization strategy of Neural Compressor.

2. To get the tuned model and its accuracy:

3. To get the benchmark of tuned model, includes Batch_size and Throughput:

4. The following is the brief output information: