Skip to content
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.

CUDA driver compatibility #7

Open
samcmill opened this issue Oct 8, 2019 · 4 comments
Open

CUDA driver compatibility #7

samcmill opened this issue Oct 8, 2019 · 4 comments

Comments

@samcmill
Copy link

samcmill commented Oct 8, 2019

Loading the container on a system with the 384 driver returns this error:

Precompiling project...
Precompiling CUDAnative
ERROR: LoadError: CUDA 10.0 is not supported by
your driver (which supports up to 9.0)

The error is coming from https://github.com/JuliaGPU/CUDAnative.jl/blob/master/src/CUDAnative.jl#L49.

Starting with CUDA 10.0, forward compatibility was introduced that allows newer CUDA toolkits to be used with older drivers: https://docs.nvidia.com/deploy/cuda-compatibility/index.html.

Can the CUDAnative logic be modified to recognize the new CUDA compatibility?

An alternative solution would be to downgrade the base container image from CUDA 10.0 to 9.x (e.g., nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 -> nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04 or nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04)

@maleadt
Copy link
Member

maleadt commented Oct 15, 2019

Starting with CUDA 10.0, forward compatibility was introduced that allows newer CUDA toolkits to be used with older drivers: https://docs.nvidia.com/deploy/cuda-compatibility/index.html.

OK, but is there an API to query the driver version and figure out the actual compatibility?

@maleadt
Copy link
Member

maleadt commented Oct 22, 2019

I also just read the section that says:

3.2. Forward-Compatible Upgrade Path

The new upgrade path for the CUDA driver is meant to ease the management of large production systems for enterprise customers. As such, the supported HW (hardware) for this new upgrade path is limited to Tesla GPU products. It’s important to note that HW support is defined by the kernel mode driver and as such, newer CUDA drivers on their own will not enable new HW support. Refer to Hardware Support for which hardware is supported by your system.

So forwards compatibility only holds for Tesla hardware, and for other hardware the failing version check you mention above is still authoritative?

@samcmill
Copy link
Author

^^^ Correct

@maleadt
Copy link
Member

maleadt commented Feb 12, 2020

I just thought of this issue again, but I don't think we can fix it already:

  • I don't know of an API to query the driver version and ensure forwards compatibility holds
  • doesn't forwards compatibility imply decoupling libcuda.so from the driver, and upgrading it together with the toolkit instead? if so, why doesn't it simply return the capability of that toolkit instead of the driver one?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants