-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3d_lowres Inference RuntimeError: Some background workers are no longer alive #2182
Comments
Hello, 2024-06-11 01:55:16.900306: predicting FLARE22_046 |
Moreover, when training the model on another machine with a 4080 GPU in the same environment (the current machine uses a 3090), I sometimes encounter a deadlock situation (which I believe is a deadlock). The CPU and GPU memory are both occupied, but their utilization is around 1%, causing the training to get stuck and unable to progress at a certain epoch. Of course, I have no issues training the model on the 3090 machine, but during validation, I encountered the error "Some background workers are no longer alive."I checked the nnUNet issues and concluded that the CPU's RAM is full, but there is no useful solution available. It's frustrating! |
Hello,
I am currently encountering the following error, which occurs during the validation result generation after training 3D_lowres. No files have been generated in either the 'val' folder or the 'predict_from_next_stage' folder. Similarly, I tested 3D_lowres during inference and encountered the same error. According to my resource monitor, the memory has run out, but training and inference on 3D_fullres run smoothly without any issues.
Could you suggest how we might resolve this problem?
Thank you!
2024-05-14 23:55:29.948744: predicting 0001
2024-05-14 23:55:33.011472: predicting 0002
2024-05-14 23:55:35.686156: predicting 0003
2024-05-14 23:55:37.476846: predicting 0004
Traceback (most recent call last):
File "/home/chenney/anaconda3/envs/nnUNet/bin/nnUNetv2_train", line 8, in
sys.exit(run_training_entry())
File "/home/chenney/nnUNet/nnunetv2/run/run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/chenney/nnUNet/nnunetv2/run/run_training.py", line 208, in run_training
nnunet_trainer.perform_actual_validation(export_validation_probabilities)
File "/home/chenney/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1168, in perform_actual_validation
proceed = not check_workers_alive_and_busy(segmentation_export_pool, worker_list, results,
File "/home/chenney/nnUNet/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy
raise RuntimeError('Some background workers are no longer alive')
RuntimeError: Some background workers are no longer alive
The text was updated successfully, but these errors were encountered: