-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to query whether CUDA awareness is actually turned on or not at runtime? #7963
Comments
I didn't expect I could follow up here so quickly 😁 @yosefe and @bureddy kindly pointed out in openucx/ucx#5471 that |
In this case cuda awareness is partial. UCX still depends on OMPI cuda support for collectives and cuda datatype pack/unpack. |
@bureddy I don't know if you want to address this here or over at the UCX issue, but I think the customer ask is that
Both of those can be narrowed down further, I'm sure. But the point is that the current |
You have to keep in mind that when we added that function all CUDA support was through the OB1 PML, so the answer was always correct. Now we will need to interrogate the selected PML to check if the capability is supported. |
Hi, I am aware of the build time and runtime checks outlined in https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-aware-support, so please bear with me til the end.
We are looking for a runtime way to check whether CUDA awareness is actually turned on in Open MPI, see the original discussion in the mpi4py repo. It turns out that the existing API
MPIX_Query_cuda_support()
is useless. I quote @jsquyres:The reason is it only tells us if
--with-cuda
is set at build time, but the smcuda btl could still be ejected (as defaulted in conda-forge'sopenmpi
package: conda-forge/openmpi-feedstock#56) and thus no CUDA awareness.The situation becomes even more complicated when UCX is in use. As pointed out by @jsquyres, Open MPI could be built without CUDA while UCX is, and in this situation we still get CUDA awareness (but
MPIX_Query_cuda_support()
would returnfalse
)! For this, I am requesting the UCX side to support a runtime query (openucx/ucx#5471), which should be incorporated here if UCX is in use by Open MPI.In short, it would be great if Open MPI could set up a mechanism (hopefully another public API, but leave the existing
MPIX_Query_cuda_support()
intact to avoid confusion) for us to query, at runtime:I hope I do not misunderstand the situation or miss any critical pieces of information. Thanks.
cc: @dalcinl
The text was updated successfully, but these errors were encountered: