This document list steps of reproducing Intel Optimized PyTorch ssd_resnet34 models tuning results via Neural Compressor.
Our example comes from MLPerf Inference Benchmark Suite
PyTorch 1.8 or higher version is needed with pytorch_fx backend.
cd examples/pytorch/object_detection/ssd_resnet34/quantization/ptq/fx
pip install -r requirements.txt
Check your gcc version with command : gcc -v
GCC5 or above is needed.
bash prepare_loadgen.sh
- Step1: Download COCO2017 dataset and extract it.
- Step2: Upscale COCO2017 dataset image size into 1200x1200 with prepare_dataset.sh
dataset | download link |
---|---|
coco (validation) | http://images.cocodataset.org/zips/val2017.zip |
coco (annotations) | http://images.cocodataset.org/annotations/annotations_trainval2017.zip |
cd examples/pytorch/object_detection/ssd_resnet34/quantization/ptq/fx
bash prepare_dataset.sh --origin_dir=origin_dataset --convert_dir=convert_dataset
Make sure origin_dataset (COCO2017) have two folder: val2017 and annotations.
cd examples/pytorch/object_detection/ssd_resnet34/quantization/ptq/fx
wget https://zenodo.org/record/3236545/files/resnet34-ssd1200.pytorch
The changes made are as follows:
- add conf.yaml:
This file contains the configuration of quantization. Users need add calibration dataset path in it.
COCONpy: root: ./ npy_dir: preprocessed/coco-1200-pt/NCHW/val2017/ anno_dir: convert_dataset/annotations/instances_val2017.json
Note: the npy file does not exist in current folder and will be generated after the progress is initialized. So please keep npy_dir path pointing to preprocessed/coco-1200-pt/NCHW/val2017/ in current folder, You can also use absolute path by adding your current path before preprocessed/coco-1200-pt/NCHW/val2017/.
Such as: /home/xxx/neural_compressor/examples/pytorch/fx/object_detection/ssd_resnet34/ptq/preprocessed/coco-1200-pt/NCHW/val2017/
- edit python/main.py: we import neural_compressor in it.
- edit python/model/ssd_r34.py: we wrap functions with @torch.fx.wrap to avoid ops cannot be traced by fx mode.
bash run_tuning.sh --topology=ssd-resnet34 --dataset_location=./convert_dataset --input_model=./resnet34-ssd1200.pytorch --output_model=./saved_results
bash run_benchmark.sh --topology=ssd-resnet34 --dataset_location=./convert_dataset --input_model=./resnet34-ssd1200.pytorch --config=./saved_results --mode=benchmark --int8=true/false
Left part is accuracy/percentage, right part is time_usage/second.
sampling_size: 50(conf.yaml) FP32 baseline is: [19.6298, 3103.3418] Pass quantize model elapsed time: 76469.05 ms Tune 1 result is: [19.1733, 763.7865] Pass quantize model elapsed time: 22288.36 ms Tune 2 result is: [19.4817, 861.9649]
Batch size | Latency | Throughput | |
---|---|---|---|
fp32 | 1 | 878.225 ms | 1.139 samples/sec |
int8 | 1 | 97.111 ms | 10.298 samples/sec |
sampling_size: 500(conf.yaml)
FP32 baseline is: [19.6298, 3103.3418]
Pass quantize model elapsed time: 480769.63 ms
Tune 1 result is: [19.0617, 649.5251]
Pass quantize model elapsed time: 215259.43 ms
Tune 2 result is: [19.5257, 636.5329]
···