Skip to content

UCX environment parameters

Jerry Liu edited this page Jun 18, 2022 · 13 revisions

Setting the transports to use

UCX_TLS variable controls the transports to use.
More than one transport can be specified, for example: UCX_TLS=rc,self,sm.

NOTE In addition to the built-in transports it's possible to use aliases which specify multiple transports.
Using a \ prefix before a transport name treats it as an explicit transport name rather than an alias.

List of main transports and aliases

all use all the available transports.
sm all shared memory transports.
shm same as "sm".
ugni ugni_rdma and ugni_udt.
rc RC (=reliable connection), and UD (=unreliable datagram) for connection bootstrap.
"accelerated" transports are used if possible.
ud UD transport, "accelerated" is used if possible.
dc DC - Mellanox scalable offloaded dynamic connection transport
rc_x Same as "rc", but using accelerated transports only
rc_v Same as "rc", but using Verbs-based transports only
ud_x Same as "ud", but using accelerated transports only
ud_v Same as "ud", but using Verbs-based transports only
tcp TCP over SOCK_STREAM sockets
cuda_copy Use cu\*Memcpy for host<->cuda device self transfers but also to detect cuda memory
gdr_copy Use GDRcopy library for host<->cuda device self transfers
cuda_ipc Use CUDA-IPC for cuda device<->device transfers over PCIe/NVLINK
rocm_copy Use for host-rocm device transfers
rocm_ipc Use IPC for rocm device-device transfers
self Loopback transport to communicate within the same process

For example:

  • UCX_TLS=rc will select rc and ud
  • UCX_TLS=rc,cm will select rc, ud, and cm
  • UCX_TLS=\rc,cm will select rc and cm

Setting the devices to use

In order to specify the devices to use for the run, please use the following environment parameters:

  • UCX_NET_DEVICES for specifying the network devices. For example: mlx5_1:1 , mlx5_1:1 GEMINI.
  • UCX_SHM_DEVICES for specifying the shared memory devices. The only available device is memory.
  • UCX_ACC_DEVICES for specifying the acceleration devices. For example: gpu0.

The following command line will use the rc_x and sysv transports, and their corresponding devices will be mlx5_0:1 and memory.
mpirun -mca pml ucx -x UCX_TLS=rc_x,sysv -x UCX_NET_DEVICES=mlx5_0:1 ...

This way, for instance, making the choice for the HCA to use doesn't affect the devices used for the shared memory UCTs.

If one or more of these environment variables are not set, their default values will be used.
The current default for each of them is 'all', which means to use all available devices and all available transports.

The following command shows the default values of these (as well as all other) environment parameters:
$ ./bin/ucx_info -f

For these specific ones:
$ ./bin/ucx_info -f | grep DEVICES
UCX_NET_DEVICES=all
UCX_SHM_DEVICES=all
UCX_ACC_DEVICES=all

Clone this wiki locally