You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my HPC environment, srun accomplishes pinning of MPI ranks to specific cores and GPU-s (by setting ROCR_VISIBLE_DEVICES). However, this conflicts with rccl-tests, which tries to manually select GPUs based on the MPI rank.
I have fixed this in my own build (frobnitzem@5b347ee) by always running the step gpuid = gpuid % args->localNumDevices, regardless of whether args->enable_multiranks is true or not.
I suggest adopting this change, and reverting the update: d16d1fb which throws an error in this case instead.
The text was updated successfully, but these errors were encountered:
In my HPC environment, srun accomplishes pinning of MPI ranks to specific cores and GPU-s (by setting ROCR_VISIBLE_DEVICES). However, this conflicts with rccl-tests, which tries to manually select GPUs based on the MPI rank.
I have fixed this in my own build (frobnitzem@5b347ee) by always running the step
gpuid = gpuid % args->localNumDevices
, regardless of whetherargs->enable_multiranks
is true or not.I suggest adopting this change, and reverting the update: d16d1fb which throws an error in this case instead.
The text was updated successfully, but these errors were encountered: