- will improve the performance with better training recipe.
- Simplify model by moving unnecessary settings and renaming the classes to ease understanding.
- Upload benchmark script to ease latency benchmark.
torch>=1.7.0; torchvision>=0.8.0; pyyaml; timm==0.6.13;
data prepare: ImageNet with the following folder structure, you can extract ImageNet by this script.
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
We upload the checkpoints with distillation and logs to google drive. Feel free to download.
Model | #params | Image resolution | Top1 Acc | Download |
---|---|---|---|---|
EfficientMod-xxs | 4.7M | 224 | 77.1 | [checkpoint & logs] |
EfficientMod-xs | 6.6M | 224 | 79.4 | [checkpoint & logs] |
EfficientMod-s | 12.9M | 224 | 81.9 | [checkpoint & logs] |
EfficientMod-s-Conv (No Distill.) | 12.9M | 224 | 80.5 | [checkpoint & logs] |
To evaluate our EfficientMod models, run:
python3 validate.py /path/to/imagenet --model {model} -b 256 --checkpoint {/path/to/checkpoint}
We show how to train EfficientMod on 8 GPUs.
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --data {path-to-imagenet} --model {model} -b 256 --lr 4e-3 --amp --model-ema --distillation-type soft --distillation-tau 1 --auto-resume --exp_tag {experiment_tag}
We also provide a script to help benchmark model latency on different platforms, which is important but always not available.
In this script, we can benchmark different models, different input resolution, different hardwares (ONNX on CPU, ONNX on GPU, Pytorch on GPU) using ONNX Runtime.
Meanwhile, we can save a detailed log file ({args.results_file}, e.g., debug.csv) that can log almost all detailed information (including model related logs, data related, benchmark results, system / hardware related logs) for each benchmark.
onnxruntime-gpu==1.13.1; onnx==1.13.0; tensorrt==8.5.2.2; torch>=1.7.0; torchvision>=0.8.0; timm==0.6.13; fvcore; thop; py-cpuinfo;
# Please feel free to add / modify configs if necessary
# Benchmark results will be printed and saved to {args.results_file}, appending to the last row.
CUDA_VISIBLE_DEVICES=0 python3 benchmark_onnx.py --model {model-name} --input-size 3 244 244 --benchmark_cpu
See folder detection for Detection and instance segmentation tasks on COCO..
See folder segmentation for Semantic Segmentation task on ADE20K.
@inproceedings{
ma2024efficient,
title={Efficient Modulation for Vision Networks},
author={Xu Ma and Xiyang Dai and Jianwei Yang and Bin Xiao and Yinpeng Chen and Yun Fu and Lu Yuan},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=ip5LHJs6QX}
}