How to use RandomSampler in RetinaNet #6971
Labels
community discussion
community help wanted
Extra attention is needed
usage
About how to use/change the configs/codes etc.
When I use RandomSampler in RetinaNet, I got an error.
/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [127,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/apis/train.py", line 175, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 233, in train_step
losses = self(**data)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/base.py", line 167, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 79, in forward_train
gt_labels, gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 652, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 182, in new_func
return old_func(*args, **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 510, in loss
label_channels=label_channels)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 417, in get_targets
unmap_outputs=unmap_outputs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(map_results)))
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/models/dense_heads/tail_retina_head.py", line 318, in _get_targets_single
fill=self.num_classes)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/utils/misc.py", line 37, in unmap
ret[inds.type(torch.bool)] = data
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1607370141920/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f6be11248b2 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f6be1376982 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f6be110fb7d in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fea0a (0x7f6c1e461a0a in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5feab6 (0x7f6c1e461ab6 in /home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #23: __libc_start_main + 0xf0 (0x7f6c48515840 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
Does anyone know why?
The text was updated successfully, but these errors were encountered: