Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG no#1 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #1

Closed
ashuezy opened this issue Mar 8, 2021 · 5 comments
Closed

Comments

@ashuezy
Copy link

ashuezy commented Mar 8, 2021

root@1192704b450d:/opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train# python3 predict.py 
<class 'lfd.model.lfd.LFD'>
Traceback (most recent call last):
  File "predict.py", line 26, in <module>
    results = config_dict['model'].predict_for_single_image(image, aug_pipeline=simple_widerface_val_pipeline, classification_threshold=0.5, nms_threshold=0.3)
  File "../lfd/model/lfd.py", line 553, in predict_for_single_image
    predicted_classification, predicted_regression = self.forward(data_batch)
  File "../lfd/model/lfd.py", line 493, in forward
    backbone_outputs = self._backbone(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../lfd/model/backbone/lfd_resnet.py", line 479, in forward
    x = self._stem(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

@YonghaoHe
Copy link
Owner

@ashuezy you have to check if pytorch is installed correctly with corresponding CUDNN.

@ashuezy
Copy link
Author

ashuezy commented Mar 9, 2021

As I can see there is no compatible docker for the following:
CUDA 10.2, CUDNN 8.0.4, TensorRT 7.2.2.3

https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-02.html#rel_21-02
Can you specify which Nvidia docker is compatible with your build?

@ashuezy
Copy link
Author

ashuezy commented Mar 9, 2021

Here is the docker setup commands.

->pull the docker image
docker pull nvcr.io/nvidia/tensorrt:20.09-py3

->bash inside the docker
docker run -it --gpus all nvcr.io/nvidia/tensorrt:20.09-py3 /bin/bash

->install pytorch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html

->checkout the repo
mkdir /opt/github/
cd /opt/github/
git clone --recursive https://github.com/YonghaoHe/LFD-A-Light-and-Fast-Detector

->see the container id in terminal2
docker container ps

-> copy the OneDrive-2021-03-08.zip and libjpeg-turbo-2.0.5.tar.gz inside the container in terminal2
docker cp OneDrive-2021-03-08.zip <container_id>:/opt/github/LFD-A-Light-and-Fast-Detector
docker cp libjpeg-turbo-2.0.5.tar.gz <container_id>:/opt/

-> extract the zip in terminal1
unzip OneDrive-2021-03-08.zip

-> extract and compile libjpeg-turbo-2.0.5.tar.gz
tar -xvf libjpeg-turbo-2.0.5.tar.gz
cd libjpeg-turbo-2.0.5
mkdir build
cd build
cmake ..
make
cp libturbojpeg.so.0.2.0 /opt/github/LFD-A-Light-and-Fast-Detector/lfd/data_pipeline/dataset/utils/libs/

-> Install the repo
cd /opt/github/LFD-A-Light-and-Fast-Detector/
python setup.py build_ext

pip install opencv-python
apt-get install -y libgl1-mesa-dev
pip install albumentations
pip install pycocotools

cd /opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train

changes in predict.py

  1. Add this to the top
    import sys
    sys.path.append('..')

  2. Change to this
    from WIDERFACE_LFD_XS import config_dict, prepare_model

  3. Change to this
    param_file_path = './../epoch_1000.pth'

  4. Change last 3 lines
    cv2.imwrite('output.jpg', image)
    #cv2.imshow('im', image)
    #cv2.waitKey()

  5. python predict.py
    image

@ashuezy ashuezy closed this as completed Mar 9, 2021
@YonghaoHe
Copy link
Owner

@ashuezy That's great!

@iqraJilani
Copy link

@ashuezy where can i find the one drive zip file ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants