Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system has unsupported display driver / cuda driver combination #57

Closed
Dadiao-shuai opened this issue Sep 27, 2023 · 14 comments
Closed

Comments

@Dadiao-shuai
Copy link

Dadiao-shuai commented Sep 27, 2023

python3 train.py -p first_ -d /root/data --dynamic

  0           test first_test0.types
  0          train first_train0.types
  1           test first_test1.types
  1          train first_train1.types
  2           test first_test2.types
  2          train first_train2.types
WRITING solver.36822.prototxt

Traceback (most recent call last):
  File "/root/data/gnina_train/train.py", line 932, in <module>
    results = train_and_test_model(args, train_test_files[i], outname, cont)
  File "/root/data/gnina_train/train.py", line 441, in train_and_test_model
    solver = caffe.get_solver(solverf)
RuntimeError: system has unsupported display driver / cuda driver combination

SYSTEM INFORMATION

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

ls /usr/local
bin cuda cuda-11 cuda-11.7 etc games include lib man python sbin share src

@Kerro-junior
Copy link

I have a suggestion, when running train.py, you may need to specific --gpu 0 to use the gpu in your machine?

@Dadiao-shuai
Copy link
Author

I tried --gpu 0, but still : RuntimeError: system has unsupported display driver / cuda driver combination

for your imformation, I build this container with : docker run -itd --gpus '"device=0"' ...

@dkoes
Copy link
Contributor

dkoes commented Sep 27, 2023

Sounds like a driver/cuda mismatch. Perhaps you have updated your drivers recently and need to reboot.

@Dadiao-shuai
Copy link
Author

I uninstall cuda11.7 and install 11.6, and nvcc is in my path.
please check why it still report RuntimeError when I run :

python3 train.py -m default2018.model -p first_ -d /root/data -i 1000 --weights crossdock_default2018.caffemodel --gpu 0 --dynamic

0           test first_test0.types
 0          train first_train0.types
 1           test first_test1.types
 1          train first_train1.types
 2           test first_test2.types
 2          train first_train2.types
WRITING solver.2471.prototxt
Traceback (most recent call last):
 File "/root/data/gnina_train/train.py", line 932, in <module>
   results = train_and_test_model(args, train_test_files[i], outname, cont)
 File "/root/data/gnina_train/train.py", line 439, in train_and_test_model
   caffe.set_device(args.gpu)
RuntimeError: system has unsupported display driver / cuda driver combination

@Dadiao-shuai
Copy link
Author

plus, is there anything to do with the model file? Do I need to modify the following:

        stratify_receptor: true
        stratify_affinity_min: 0
        stratify_affinity_max: 0
        stratify_affinity_step: 1.000000

I also deleted the line cachefile for receptor and ligand in the model, because I only use the default cross-val (0,1,2) types files.

My nvidia-smi is like:
image

I believe this Driver is enough for cuda11.6

@Dadiao-shuai
Copy link
Author

I installed this cuda11.6 from *.deb(local) according to the official website, and apt-get -y install cuda, do I need to use pip/pip3 to install some lib for cuda in python3?

@JonasLi-19
Copy link

I'm afraid the caffe you installed when installing gnina does not properly linking to your CUDA directories(/usr/local/cuda)?

I am not an expert on caffe and cuda, but you can look the caffe/CMakelist.txt:
image

@Dadiao-shuai
Copy link
Author

I'm afraid the caffe you installed when installing gnina does not properly linking to your CUDA directories(/usr/local/cuda)?

I am not an expert on caffe and cuda, but you can look the caffe/CMakelist.txt: image

I aleady add the /usr.local/python to PYTHONPATH, where you can find caffe.
image

I forget is there any warning information about the caffe & cuda when installing gnina,
But cmake,make,make install all finished, and I've successfully run gnina to dock ligands.

I guess the problem is about caffe, not the Nvidia Driver, because 520.61.05 is enough for cuda11.6/11.7.

@dkoes
Copy link
Contributor

dkoes commented Sep 28, 2023

Are you running inside docker? Does the host system driver match the docker driver?
NVIDIA/nvidia-docker#1256

@Dadiao-shuai
Copy link
Author

Yes, I run gnina train.py inside docker, the nvidia driver in host system and docker are both 520.61.05, host uses CUDA11.8, docker uses CUDA11.6.

I do not think anything wrong... Because I was told it's ok to use former cuda-toolkit in docker container, right?

@Dadiao-shuai
Copy link
Author

Is there anything wrong about my Docker? My Docker is version 19.03.13, build 4484c46d9d.

I have met Runtime Error boost::thread_resource_error before in gnina/gnina:latest image to use gnina, and now met system has unsupported display driver / cuda driver combination in another docker container to use train.py.

Fed up with gpu errors! Or should I just use cpu to do train.py?

@Dadiao-shuai
Copy link
Author

Dadiao-shuai commented Sep 30, 2023

I find that it is a pretty common error of system has unsupported display driver / cuda driver combination for docker to run gpus in many github issues.

As you mentioned, this is my ~/.bashrc:
image

This is what I got in directory related to train.py:
image

And this is the error report:

WRITING solver.2695.prototxt
Traceback (most recent call last):
  File "/root/data/gnina_iter_train/train.py", line 932, in <module>
    results = train_and_test_model(args, train_test_files[i], outname, cont)
  File "/root/data/gnina_iter_train/train.py", line 441, in train_and_test_model
    solver = caffe.get_solver(solverf)
RuntimeError: system has unsupported display driver / cuda driver combination

YOU MIGHT HAVE NOTICED THAT: WRITING solver.2695.prototxt

@SanFran-Me

@Dadiao-shuai
Copy link
Author

Dadiao-shuai commented Sep 30, 2023

I just find the problem solver file:
image

AND this is a part of traintrain2695.txt:

layer {
  name: "data"
  type: "MolGridData"
  top: "data"
  top: "label"
  top: "affinity"
  top: "rmsd_true"
  include {
    phase: TEST
  }
  molgrid_data_param {
    source: "first_iter_train0.types"
    batch_size: 50
    dimension: 23.5
    resolution: 0.5
    shuffle: false
    balanced: false
    root_folder: "/root/data"
    recmap: "completerec"
    ligmap: "completelig"
    has_affinity: true
    has_rmsd: true
  }
}
layer {
  name: "data"
  type: "MolGridData"
  top: "data"
  top: "label"
  top: "affinity"
  top: "rmsd_true"
  include {
    phase: TRAIN
  }
  molgrid_data_param {
    source: "first_iter_train0.types"
    batch_size: 50
    dimension: 23.5
    resolution: 0.5
    shuffle: true
    balanced: true
    root_folder: "/root/data"
    random_rotation: true
    random_translate: 6.0
    recmap: "completerec"
    ligmap: "completelig"
    has_affinity: true
    has_rmsd: true
    stratify_receptor: true
    stratify_affinity_min: 0.0
    stratify_affinity_max: 0.0
    stratify_affinity_step: 1.0
    jitter: 0.0
  }
}

Additionally, this is my types file example:

1 6.22 0 pdb2019_refi_train_gninatypes/3acl/3acl_rec.gninatypes pdb2019_refi_train_gninatypes/3acl/3acl_ligand.gninatypes
1 6.0096 0.5408 pdb2019_refi_train_gninatypes/4djw/4djw_rec.gninatypes first_pdbbind_v2019_docked_gninatypes/4djw_docked_0.gninatypes
0 0.6729999999999999 7.694 pdb2019_refi_train_gninatypes/4gzw/4gzw_rec.gninatypes first_pdbbind_v2019_docked_gninatypes/4gzw_docked_6.gninatypes

@dkoes
Copy link
Contributor

dkoes commented Oct 1, 2023

Update your docker image to match your host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants