-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA_Runtime] only add include_dependency if version not specified #7523
Conversation
Thanks. The approach is not entirely correct though, as there's a reason I put this logic before loading CUDA_Driver_jll: On the other hand, maybe we should have CUDA_Driver_jll invalidate itself separately if the actual system driver got updated, which should then automatically invalidate CUDA_Runtime_jll. For that to work, we should validate that invalidating CUDA_Driver_jll results in CUDA_Runtime_jll (and thus CUDA.jl) getting precompiled again. |
What determines whether it uses the system driver or the JLL-provided driver is used? |
I have spent a few hours thinking about this, and I don't think there is a coherent way we can make cache invalidation work based on system files, while also hoping to support shared file systems. There are two issues with the current approach:
The only way to fix 2 is to explicitly set the version in the preferences: i.e. if you want to (or need to) use an older toolkit than the current one, you are required to set the CUDA_Runtime_jll preferences. This would then avoid the need to |
Platform and hardware compatibility. There's a host driver/forwards compatible driver compatibility chart in the NVIDIA docs, and it generally only supports enterprise/datacenter hardware.
This however only solves the runtime selection problem, where the selected runtime depends on the driver's version. In the case where no driver was available at runtime, we essentially require either use of the local toolkit, or that the user provided a runtime version to use. In both those cases, we don't need to invalidate the precompilation image when the version changes, because we won't be selecting a different toolkit anyway, right? The other problem is that we also use the driver's version number for determining which CUDA driver APIs to use. I was hoping we could perform such decisions at top level, but it looks like that won't be possible if we want to support precompiling on a system without the CUDA driver... |
What I meant is that you could have a case where you precompile on a node without a GPU, and so it assumes that it will use the latest CUDA toolkit (12.2): as no driver is available, the driver won't be part of the
In that case, I think it would be reasonable to require the user to specify some sort of minimum CUDA driver version: perhaps we could also make this a preference (this would also help in the case where I'm using a cluster with a mixture of CUDA driver versions) |
@maleadt In the meantime, how about this change? It doesn't include the driver library as a |
thank you! |
We were hitting an issue where precompiling on a node with a GPU driver installed would then retrigger precompilation when used on a non-GPU node.
This should avoid the problem when the CUDA runtime version is concretely specified
cc @maleadt @vchuravy