-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoBatch: CUDA anomaly detected #9287
Comments
👋 Hello @alexk-ede, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
Hi and happy monday to you. I use autobatch quite frequently and also just updated my local yolov5 today so I took a look. GTX 1080 (8GB), PyTorch 1.12, Nvidia driver 515, Cuda 11.7, Fedora 36 When running a similar training like yours, so with input size 416 on Coco128 (I assume you mean that with slice of coco), I too get the warning.
When not inputting an input size and 640 is used, I do not get the CUDA enviornment warning.
I too have this random chunk of 2.53 in my GPU memory. I do not know what it is neither, but it does not match with my usage in nvitop before training start (around 500-600mb, with gnome desktop and xorg on). Checking back from autobatch trainings on the initial release of v6.2 , I do see:
|
Hi @Denizzje and a happy start of the week to you, too. I'll add another data point today.
But manually a batch of 16 is fine. So yeah, looks like the 2.52G reserved do interfere, distort the testing for the size and then make the interpolation invalid. Instead of just saying anomaly detected it'd be also useful to hint below, that the initial VRAM usage/reserved is suspiciously high. Update:
|
@alexk-ede AutoBatch may produce inaccurate results under certain circumstances, i.e. when previous trainings are in progress or have terminated early or not all CUDA memory has been released. If you find ways to improve please let us know, the relevant code is here: |
Yes, I saw that file after I was investigating where the warning message came from. That's where I learned about the interpolation, too.
I could understand that, if there was something using the GPU before, then that may be plausible. But as I said, this is a completely fresh boot and fresh start of the environment. Nothing was run before, and obviously no trainings in progress, as it says 0.16G allocated. This looks like it makes more sense I'd obviously prefer to have a command that gives me the same output as nvtop. I'll try later to use it here with memory_allocated instead Line 189 in 1aea74c
as it's called by autobatch here Line 51 in 1aea74c
|
I decided I'll test what will happen, when I'll run this during a training session that already uses most of the gpu memory.
Turns out it fails on the first try and the results list doesn't get initialized at all.
But in this case 0.03G allocated is also completely wrong, because the real usage is So as planned, I'll try memory_allocated later instead, but it also yields weird results. |
This is quite weird, I just quickly tested this demo code while a training is running and using 6.6GB VRAM.
instead I only get now
So I don't know how to fix that with only using the interface that torch provides. The command |
There does seem to be something very wrong with the auto batch size at the moment. I believe it started after this "CUDA Anomaly" detected was implemented, though I did not do much trainings after a big batch right after the release of 6.2. This time, I tried with latest YoloV5 from master, PyTorch 1.12, Ubuntu 20.04, Python 3.8 with Nvidia drivers 515 and CUDA 11.7 with an A100 80GB SXM GPU. The dataset is my regular dataset of ~40k training pictures this time, so not Coco1128. It spits out the CUDA anomaly warning and then proceeds with a batch size of 16...
For reference, this was an earlier training with the same machine but with PyTorch 1.10, Cuda 11.3 on YoloV5 release 6.2 (not master from that time), with an earlier version of the same dataset (size is roughly the same though).
After my fixed size training run (batch size 128) is finished I will try to redo the autobatch on Yolov5 release 6.2. If this 80GB card is convinced there as well that it can only fit a batch size of 16 in its memory, then the cause is somewhere else. I am curious to see what happens if I retry with PyTorch 1.10 with latest master code. |
@Denizzje yes I'm able to reproduce in Colab. Something is not correct. I'll add a TODO to investigate. |
May resolve #9287 Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@Denizzje good news 😃! Your original issue may now be fixed ✅ in PR #9448. This avoids setting To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
* AutoBatch `cudnn.benchmark=True` fix May resolve #9287 Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update autobatch.py Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update autobatch.py Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update general.py Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Awesome @glenn-jocher , did not expect this on a friday evening hehe. "Unfortunally" the A100 is still training and my GTX 1080 really cant handle my dataset properly anymore so I will wait untill its finished and then give it another try after pulling and report back ASAP if it can find its memory this time ;). |
Top of the morning, @glenn-jocher , Happy to confirm that the A100 is now convinced it actually has 80GB of VRAM, and autobatch now gives me a batch size of 192. Also the "CUDA Anomaly is detected" is gone. This is even a "dirty" start, didnt start a new terminal or reboot the system from my previous training.
Glad to see this very useful function back in action and thanks again for your quick work last night 😄 . Note my issue so I can close it but @alexk-ede is hopefully fine too when pulling the latest code from master. |
@Denizzje great! BTW we used to target 90% memory utilization but had some issues with smaller cards going over during training, which is why we dropped back to an 80% target. You can modify this Lines 21 to 28 in 5e1a955
|
Hi everyone, looks like it's going to be a good Monday today ;) And indeed, it seems to work fine right now.
I'm just not sure where the (75%) ✅ are coming from, if fraction=0.8 ... I'll have a few train runs to do soon, so I'll report back. And yes, having it <= 80% makes sense, bc I also noticed, despite showing GPU_mem 6.42G in the epoch, the actual used gpu mem is what nvtop reports 7.711G . @Denizzje what does your nvtop report when you have |
@alexk-ede 80% is the requested utilization, 75% is the predicted utilization (actual utilization will vary and is sometimes substantially different). It's possible some of the difference is coming from running AutoBatch only on the free memory vs total memory displayed later. |
@alexk-ede maybe I should re-add allocated and reserved amounts to the predicted amount for the final utilisation. This should be closer to 80%. |
May resolve #9287 (comment) Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
May resolve #9287 (comment) Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@alexk-ede good news 😃! Your original issue may now be fixed ✅ in PR #9491. This PR adds reserved and allocated memory to the final estimated utilization rate displayed, which should result in a value closer to the default requested 80%. To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
Hi, This one worked (but was PR #9448 and before PR #9491 ):
but these ones failed:
Then I tried setting to 75% instead
It ran a bit longer, but then failed. Still rather odd that it ran for over 10 sec. Usually it fails instantly when it runs out of vram. And last try with 70%
Seems to run. Anyway, these 8GB cards will just have to work with a lower fraction, no other way around that. |
Update to the last ones, those results may be invalid. Dataset itself is around 24gb cached in ram. Were there any other changes that could affect that ? |
@alexk-ede dataset caching is independent of CUDA usage, it either uses RAM or disk space. |
Yes, I know, I'm using the --cache option to use RAM. Otherwise the CPU load is just insane and the CPU can't keep up with the GPU. |
Hello @alexk-ede , I cannot check at the moment because I am doing a training on release 6.1 at the moment (no clearML and got deallocated overnight so I miss the original logs at the beginning). Have you tried however, to try something else than yolov5n (yolov5m something), on that slice of COCO? Is it actually representable for your dataset / use case? Because I do remember when mucking about with Coco128 I actually crashed my training with autobatch and a yolov5n for instance. |
@Denizzje yeah I tried various yolov5 sizes, mostly n, s, m, (sometimes l just for testing). |
Search before asking
Question
So I'm testing the autobatch feature which is pretty cool.
It seemed to work fine last week, but this week for whatever reason (maybe bc it's Monday, who knows ...) I'm having issues with it.
I'm running the yolov5s (latest git checkout ofc) and getting this (when using --batch -1)
Dataset is a slice from COCO
Meanwhile, the nvtop output is this before running the train.py
So there isn't really anything in the GPU memory.
I am unsure about this from AutoBatch
The 2.20G reserved is weird, because I stopped everything (including gdm3), so nothing is running on the GPU.
(besides the training process later).
And I can easily set batch to 80 and it works fine:
I obviously did the recommended restart environment and even restarted the machine. Autobatch still complained about around 2.20G reserved
Any ideas how I can investigate this ?
My guess is, the 2.2GB do mess up the interpolation for autobatch because the GPU_mem (GB) column doesn't make much sense.
Additional
Maybe the issue title should be changed to AutoBatch: CUDA anomaly detected
some additional system info
During training, it shows me usage around
So not sure where the rest went (aka the difference to the 7.2GB in nvtop) ...
The text was updated successfully, but these errors were encountered: