PyNetspresso provides a comprehensive process for training, compressing, converting, and benchmarking the Yolo-Fastest model, ensuring optimal performance on Arm Cortex-M85 and Cortex-M55. Particularly useful for those seeking to seamlessly apply the Yolo-Fastest model to Arm processor, this repository facilitates efficient deployment and enhances the overall workflow.
0. Sign up
1. Install
2. Prepare dataset
3. Training
4. Compress model, convert to tflite, and benchmark with PyNetsPresso
To get started with the NetsPresso Python package, you will need to sign up at NetsPresso.
Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch >= 1.11, < 2.0.
git clone https://github.com/Nota-NetsPresso/ModelZoo-YOLOFastest-for-ARM-U55-M85.git # clone
cd ModelZoo-YOLOFastest-for-ARM-U55-M85
pip install -r requirements.txt # install
Download the STREETS dataset and annotations from link, unzip, and move the vehicleannotaitons folder to ../dataset/ directory
Your code structure should like
├── datasets
│ └── vehicleannotaitons
│ ├── images
│ └── annotations
│
└── ModelZoo-YOLOFastest-for-ARM-U55-M85
If you want to start from scratch, create a '.pt' file via 'train.py'.
python train.py --data ./data/STREETS.yaml --epochs 300 --weights '' --cfg ./models/yolo-fastest.yaml --batch-size 64
auto_process.py
provides integrated process which contains torch.fx converting, model compression, fx model retraining, onnx exporting, tflite converting, device benchmark, and mAP validation. You can execute auto_process.py
with minimal training hyper-parameters and NetsPresso account information.
You can choose Renesas-RA8D1 (Arm Cortex-M85) or Ensemble-E7-DevKit-Gen2 (Arm Cortex-M55 + Ethos-U55) device, and boost inference speed by giving Helium option.
python auto_process.py --data ./data/STREETS.yaml --name yolo_fastest --weight_path ./models/yolo_fastest_streets.pt --epochs 300 --batch-size 64 --np_email '' --np_password '' --target_device Renesas-RA8D1 --helium
Model | Format | Precision | Size (pixels) |
mAPval 50-95 |
mAPval 50 |
Speed Cortex-M85 (ms) |
Speed Cortex-M85 with Helium (ms) |
Speed Ethos-U55 (ms) |
Params (M) |
---|---|---|---|---|---|---|---|---|---|
YOLO-Fastest | PyTorch | FP32 | 256 | 41.6 | 75.5 | - | - | - | 0.3 |
YOLO-Fastest | TFLite | Full INT8 | 256 | 39.7 | 73.7 | 594 | 269 | 6.8 | 0.3 |
Compressed YOLO-Fastest | TFLite | Full INT8 | 256 | 37.3 | 71.5 | 513 | 234 | 6.0 | 0.2 |
Table Notes
- The checkpoint is trained to 300 epochs with default settings. The model uses hyp.scratch-low.yaml hyps.
- mAPval values are for single-model single-scale on the STREETS dataset.
Reproduce bypython val.py --weights './models/yolo_fastest_streets_256.pt' --data ./data/STREETS.yaml --img 256
for pytorch ckpt file, andpython val.py --weights './models/yolo_fastest_streets_full_int8_256.tflite' --data ./data/STREETS.yaml --img 256 --anchors-for-tflite-path ./models/yolo_fastest_streets_256_anchors.json
for full int8 tflite file. - Speed is making inference for a STREETS val image using Cortex-M85 (with/without helium) and Ethos-U55.
Join our Discussion Forum for providing feedback or sharing your use cases, and if you want to talk more with Nota, please contact us here.
Or you can also do it via email(contact@nota.ai) or phone(+82 2-555-8659)!