-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: device-side assert triggered #5801
Comments
The error message indicates there is a data structure(list, Tensor) out of indexes, you can find the more precise position of the error by adding |
Feel free to reopen the issue if there is any question |
Traceback (most recent call last): |
I use this CUDA_LAUNCH_BLOCKING=1 before your command. This is an error after adding the command. |
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
A clear and concise description of what the bug is.
RuntimeError: CUDA error: device-side assert triggered
Reproduction
What command or script did you run?
CUDA_VISIBLE_DEVICES=3 python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Did you make any modifications on the code or config? Did you understand what you have modified?
I don't modifications on the code or config.
What dataset did you use?
coco
Environment
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3: GeForce GTX 1080 Ti
CUDA_HOME: /disk1/huim/softwares/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.8.1
PyTorch compiling details: PyTorch built with:
TorchVision: 0.9.1
OpenCV: 4.5.3
MMCV: 1.3.9
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMDetection: 2.15.0+62a1cd3
conda install pytorch cudatoolkit=10.1 torchvision -c pytorch
Error traceback
If applicable, paste the error trackback here.
2021-08-05 23:34:13,140 - mmdet - INFO - Epoch [1][50/58633] lr: 1.978e-03, eta: 3 days, 7:04:43, time: 0.405, data_time: 0.054, memory: 3786, loss_rpn_cls: 0.5552, loss_rpn_bbox: nan, loss_cls: 0.7884, acc: 88.8535, loss_bbox: 0.1492, loss: nan
/opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [32,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [33,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [34,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [35,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [0,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [1,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [2,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [3,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [4,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [5,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [6,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [7,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [8,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [9,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [10,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [11,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [12,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [13,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [14,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [15,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [16,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [17,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [18,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [19,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [20,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [21,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [22,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [23,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [28,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [29,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [30,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [31,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/disk1/huim/projects/mmdetection/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/disk1/huim/projects/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/disk1/huim/projects/mmdetection/mmdet/models/detectors/base.py", line 171, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/disk1/huim/projects/mmdetection/mmdet/models/detectors/two_stage.py", line 140, in forward_train
proposal_cfg=proposal_cfg)
File "/disk1/huim/projects/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/disk1/huim/projects/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 74, in loss
gt_bboxes_ignore=gt_bboxes_ignore)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func
return old_func(*args, **kwargs)
File "/disk1/huim/projects/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 463, in loss
label_channels=label_channels)
File "/disk1/huim/projects/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 345, in get_targets
unmap_outputs=unmap_outputs)
File "/disk1/huim/projects/mmdetection/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/disk1/huim/projects/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 236, in _get_targets_single
sampling_result.pos_bboxes, sampling_result.pos_gt_bboxes)
File "/disk1/huim/projects/mmdetection/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py", line 59, in encode
encoded_bboxes = bbox2delta(bboxes, gt_bboxes, self.means, self.stds)
File "/disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/utils/parrots_jit.py", line 21, in wrapper_inner
return func(args, kargs)
File "/disk1/huim/projects/mmdetection/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py", line 136, in bbox2delta
means = deltas.new_tensor(means).unsqueeze(0)
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554827596/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa0e48092f2 in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x5b (0x7fa0e480667b in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x809 (0x7fa0e4a62219 in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fa0e47f13a4 in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e6aba (0x7fa12ca4baba in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e6b61 (0x7fa12ca4bb61 in /disk1/huim/softwares/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #24: __libc_start_main + 0xf0 (0x7fa152495840 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
I not found reasons of error
The text was updated successfully, but these errors were encountered: