-
Notifications
You must be signed in to change notification settings - Fork 433
UCX environment parameters
UCX_TLS
variable controls the transports to use.
More than one transport can be specified, for example: UCX_TLS=rc,self,sm
.
NOTE In addition to the built-in transports it's possible to use aliases which specify multiple transports.
Using a\
prefix before a transport name treats it as an explicit transport name rather than an alias.
all | use all the available transports. |
sm | all shared memory transports. |
shm | same as "sm". |
ugni | ugni_rdma and ugni_udt. |
rc | RC (=reliable connection), and UD (=unreliable datagram) for connection bootstrap.
"accelerated" transports are used if possible. |
ud | UD transport, "accelerated" is used if possible. |
dc | DC - Mellanox scalable offloaded dynamic connection transport |
rc_x | Same as "rc", but using accelerated transports only |
rc_v | Same as "rc", but using Verbs-based transports only |
ud_x | Same as "ud", but using accelerated transports only |
ud_v | Same as "ud", but using Verbs-based transports only |
tcp | TCP over SOCK_STREAM sockets |
cuda_copy | Use cu\*Memcpy for host<->cuda device self transfers but also to detect cuda memory |
gdr_copy | Use GDRcopy library for host<->cuda device self transfers |
cuda_ipc | Use CUDA-IPC for cuda device<->device transfers over PCIe/NVLINK |
rocm_copy | Use for host-rocm device transfers |
rocm_ipc | Use IPC for rocm device-device transfers |
self | Loopback transport to communicate within the same process |
For example:
-
UCX_TLS=rc
will select rc and ud -
UCX_TLS=rc,cm
will select rc, ud, and cm -
UCX_TLS=\rc,cm
will select rc and cm
In order to specify the devices to use for the run, please use the following environment parameters:
-
UCX_NET_DEVICES
for specifying the network devices. For example:mlx5_1:1
,mlx5_1:1
GEMINI
. -
UCX_SHM_DEVICES
for specifying the shared memory devices. The only available device ismemory
. -
UCX_ACC_DEVICES
for specifying the acceleration devices. For example:gpu0
.
The following command line will use the rc_x and sysv transports, and their corresponding devices will be mlx5_0:1 and memory.
mpirun -mca pml ucx -x UCX_TLS=rc_x,sysv -x UCX_NET_DEVICES=mlx5_0:1 ...
This way, for instance, making the choice for the HCA to use doesn't affect the devices used for the shared memory UCTs.
If one or more of these environment variables are not set, their default values will be used.
The current default for each of them is 'all', which means to use all available devices and all available transports.
The following command shows the default values of these (as well as all other) environment parameters:
$ ./bin/ucx_info -f
For these specific ones:
$ ./bin/ucx_info -f | grep DEVICES
UCX_NET_DEVICES=all
UCX_SHM_DEVICES=all
UCX_ACC_DEVICES=all