-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why can't run testing on GPU 1? #1775
Comments
Hi @johnnylecy , We will also try to reproduce this bug. |
hi @ZwwWayne, How should I do for checking or printing the device of the features and rois? |
|
i have the same problem? |
I don't know why this would happened but I know how to solve it . you can use cuda_visible_device to avoid set device_id = 'cuda:' + str(gpu_id) to let the model testing on your gpu。 |
same problem!!! |
Should have been fixed. |
There are 2 GPU in my computer. I run testing on GPU0,everything is normal. But I run testing on GPU1,I got a error as follow.
I use High-level APIs for testing images like this:
gpu_id = 1
device_id = 'cuda:' + str(gpu_id)
net = init_detector(config_file, checkpoint_file, device=device_id)
... ...
predict_result = inference_detector(net, im_file)
someone can help me, please? Thank you.
error:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=371 error=77 : an illegal memory access was encountered0:09, 2.57it/s]
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "evaluate_rcnn_base_mulscale_with_clsfy_mulprocess-4_stride_half.py", line 263, in eval_net
evaluate(net, sub_dir, img_name, img_result_dir, det_result_txt)
File "evaluate_rcnn_base_mulscale_with_clsfy_mulprocess-4_stride_half.py", line 202, in evaluate
predict_result = inference_detector(net, im_file)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/apis/inference.py", line 86, in inference_detector
result = model(return_loss=False, rescale=True, **data)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/models/detectors/base.py", line 119, in forward
return self.forward_test(img, img_meta, **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/models/detectors/base.py", line 102, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/models/detectors/two_stage.py", line 273, in simple_test
x, img_meta, proposal_list, self.test_cfg.rcnn, rescale=rescale)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/models/detectors/test_mixins.py", line 49, in simple_test_bboxes
x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/core/fp16/decorators.py", line 127, in new_func
return old_func(*args, **kwargs)
File "/data/nas/workspace/jupyter/mmdetection-master/mmdet/models/roi_extractors/single_level.py", line 106, in forward
roi_feats[inds] = roi_feats_t
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCGeneral.cpp:371
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fb69c895813 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x16126 (0x7fb69eaeb126 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x16b11 (0x7fb69eaebb11 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x4d (0x7fb69c885f0d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #4: + 0x4af752 (0x7fb68a3f6752 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x4af796 (0x7fb68a3f6796 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #50: __libc_start_main + 0xf5 (0x7fb6b1428b15 in /lib64/libc.so.6)
The text was updated successfully, but these errors were encountered: