How to query whether CUDA awareness is actually turned on or not at runtime? #7963

leofang · 2020-07-24T17:55:12Z

Hi, I am aware of the build time and runtime checks outlined in https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-aware-support, so please bear with me til the end.

We are looking for a runtime way to check whether CUDA awareness is actually turned on in Open MPI, see the original discussion in the mpi4py repo. It turns out that the existing API MPIX_Query_cuda_support() is useless. I quote @jsquyres:

Honestly, I don’t think we thought anyone was using it. 🙂

The reason is it only tells us if --with-cuda is set at build time, but the smcuda btl could still be ejected (as defaulted in conda-forge's openmpi package: conda-forge/openmpi-feedstock#56) and thus no CUDA awareness.

The situation becomes even more complicated when UCX is in use. As pointed out by @jsquyres, Open MPI could be built without CUDA while UCX is, and in this situation we still get CUDA awareness (but MPIX_Query_cuda_support() would return false)! For this, I am requesting the UCX side to support a runtime query (openucx/ucx#5471), which should be incorporated here if UCX is in use by Open MPI.

In short, it would be great if Open MPI could set up a mechanism (hopefully another public API, but leave the existing MPIX_Query_cuda_support() intact to avoid confusion) for us to query, at runtime:

If Open MPI's CUDA support is actually in effect or not, this needs to take into account
1. When Open MPI is not built against UCX: whether the smcuda btl is ejected or not
2. When UCX is used: whether UCX has CUDA support (How to query CUDA support at runtime? openucx/ucx#5471)

I hope I do not misunderstand the situation or miss any critical pieces of information. Thanks.

cc: @dalcinl

The text was updated successfully, but these errors were encountered:

leofang · 2020-07-24T19:54:10Z

ii. When UCX is used: whether UCX has CUDA support

I didn't expect I could follow up here so quickly 😁 @yosefe and @bureddy kindly pointed out in openucx/ucx#5471 that ucp_context_query() can be used to query the CUDA support from UCX, and that Open MPI already uses this API during initialization (#7898). I think we can just take the information recorded there when UCX is in use, so half of the problem is resolved!

bureddy · 2020-07-24T20:02:08Z

The situation becomes even more complicated when UCX is in use. As pointed out by @jsquyres, Open MPI could be built without CUDA while UCX is, and in this situation we still get CUDA awareness (but MPIX_Query_cuda_support() would return false)!

In this case cuda awareness is partial. UCX still depends on OMPI cuda support for collectives and cuda datatype pack/unpack.
ideally, both OMPI and UCX has to build with cuda support.

jsquyres · 2020-07-24T21:42:56Z

@bureddy I don't know if you want to address this here or over at the UCX issue, but I think the customer ask is that MPIX_Query_cuda() be improved (or replaced with something better?). In the original discussion, two needs were identified:

Is the library compiled with CUDA support.
Is CUDA support enabled "right now" (for some definition of "right now").

Both of those can be narrowed down further, I'm sure. But the point is that the current MPIX_Query_cuda() really isn't very useful, because all it does is return a configure-time constant that indicates where Open MPI -- not even UCX -- is compiled with CUDA support or not.

bosilca · 2020-07-24T21:48:16Z

You have to keep in mind that when we added that function all CUDA support was through the OB1 PML, so the answer was always correct. Now we will need to interrogate the selected PML to check if the capability is supported.

leofang mentioned this issue Jul 24, 2020

How to query CUDA support at runtime? openucx/ucx#5471

Open

bureddy mentioned this issue Jul 28, 2020

CUDA: Enhance MPIX_Query_cuda_support() with runtime support check #7970

Merged

leofang mentioned this issue Jul 28, 2020

How to query if CUDA awareness is actually turned on or not at runtime? pmodels/mpich#4716

Closed

awlauria closed this as completed in #7970 Jun 30, 2021

Micket mentioned this issue Dec 8, 2021

Improving detection of CUDA enabled MPI in EasyBuild easybuilders/easybuild-easyconfigs#14517

Closed

Micket mentioned this issue Jan 26, 2022

Various benchmarks from OSU-Micro-Benchmarks/5.7.1-gompi-2021a-CUDA-11.3.1 segfault when using CUDA buffers easybuilders/easybuild-easyconfigs#14801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to query whether CUDA awareness is actually turned on or not at runtime? #7963

How to query whether CUDA awareness is actually turned on or not at runtime? #7963

leofang commented Jul 24, 2020 •

edited

Loading

leofang commented Jul 24, 2020 •

edited

Loading

bureddy commented Jul 24, 2020

jsquyres commented Jul 24, 2020

bosilca commented Jul 24, 2020

How to query whether CUDA awareness is actually turned on or not at runtime? #7963

How to query whether CUDA awareness is actually turned on or not at runtime? #7963

Comments

leofang commented Jul 24, 2020 • edited Loading

leofang commented Jul 24, 2020 • edited Loading

bureddy commented Jul 24, 2020

jsquyres commented Jul 24, 2020

bosilca commented Jul 24, 2020

leofang commented Jul 24, 2020 •

edited

Loading

leofang commented Jul 24, 2020 •

edited

Loading