-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results P1: Inference speed GPU vs. CPU #61
Comments
It looks like if one exports the weights to ONNX or OpenVINO formats and run detections, that might give up to 3x CPU speedup; In a similar way for GPU, the format TensorRT for up to 5x GPU speedup Then there is the option of half precision FP16 inference, but I do not understand if this is applicable for only gaining inference speed for the GPU or also for a CPU as well. EDIT1: Forgot to add the question: should I invest time to try these options, or just go with a simple detect script set for confidence 30 and IoU 10 % thresholds both for a GPU and a CPU running on the test dataset? EDIT2: Actually, I just realised that for YOLOv5 there is a benchmarks.py (either in the root folder or in utils), but there is none for YOLOv7. Moreover, there might be problems of converting to other formats with YOLOv7 (e.g. WongKinYiu/yolov7#1269). So, I guess if we do not get the same support for yolov7 as we get for yolov5, then I stay with the simpler approach. |
Hi @stark-t , I just realised that YOLOv7 and YOLOv5 differ in terms of maximum number of detections per image, I think that for a fair comparison, I need to rerun the detect.py of YOLOv5 with |
For YOLOv5, the GPU detect speed can be taken from the *.err files obtained from running the scripts The results for GPU are:
|
YOLOv7 outputs the information about inference speed in the *.log files. YOLOv7 tiny, Job 191860, file 191860.err contains (search for "conf_thres=0.3, iou_thres=0.1"):
The 191860.err file contains this info:
Unfortunately, due to cluster updates, I didn't get the total cluster job 191860 run time as email notification (this was fixed later by IT). |
Note also the parameter counts for each model:
|
Here are some first results for YOLOv5 nano CPU vs GPU: YOLOv5 nano CPU; 5 iterations with detect.py over the test dataset (210*8=1680 images); values in seconds: Roughly, that means: YOLOv5 nano GPU; 5 iterations with detect.py over the test dataset (210*8=1680 images); values in seconds: Roughly, that means: |
New best thresholds for the three models to run speed test (@valentinitnelav ):
|
Include the threshold values for conf and IoU from #61 (comment)
I have the script now, but somehow I cannot get access to GPUs, but only to CPUs. The CPU jobs are running so I'll get results for that. I think there is an issue on the cluster side because yesterday I could get GPUs to use. This will have to wait until the cluster is available again. |
ok no porblem |
CPU detection time results for 5 iterations for each model. These were run on the test dataset. CPU time, YOLOv5 nanoJob ID 883890 which run the script yolov5_detect_n_640_cpu_speed_test.sh Time results extracted from the file
That means: CPU time, YOLOv5 smallJob ID 883892 which run the script yolov5_detect_s_640_cpu_speed_test.sh Time results extracted from the file
That means: CPU time, YOLOv7 tinyJob ID 883893 which run the script yolov7_detect_tiny_640_cpu_speed_test.sh Time results extracted from the file
That means: |
GPU results. Note that the first iteration can take up to two times more time than the other iterations. Perhaps there is some GPU "warm up" taking place? Possibly related to this? ultralytics/yolov5#5806 To solve this, I run 6 intereations and dropped the results of the first iteration. See commit b120ad9 GPU time, YOLOv5 nanoJob ID 1268065 which run the script yolov5_detect_n_640_gpu_rtx_speed_test.sh Total run time for 6 iterations: Time results extracted from the file job_1268065_yolov5_nano_gpurtx_results_at_0.2_iou_0.5.txt (in PAI/detectors/yolov5/runs/detect/detect_speed_jobs on the cluster):
That means: It is a bit strange that this is similar to the small weights results from YOLOv5. GPU time, YOLOv5 smallJob ID 1268064 which run the script yolov5_detect_s_640_gpu_rtx_speed_test.sh Total run time for 6 iterations: Time results extracted from the file job_1268064_yolov5_small_gpurtx_results_at_0.3_iou_0.6.txt (in PAI/detectors/yolov5/runs/detect/detect_speed_jobs on the cluster):
That means: GPU time, YOLOv7 tinyJob ID 1268059 which run the script yolov7_detect_tiny_640_gpu_rtx_speed_test.sh Total run time for 6 iterations: Time results extracted from the file job_1268059_yolov7_tiny_gpurtx_results_at_0.1_iou_0.3.txt (in PAI/detectors/yolov7/runs/detect/detect_speed_jobs on the cluster):
That means: |
I'll close this issue now. I have put the results in the overleaf manuscript - Table 2 |
The text was updated successfully, but these errors were encountered: