Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU Support with External Pinning #42

Open
frobnitzem opened this issue Jul 24, 2023 · 0 comments
Open

Multi-GPU Support with External Pinning #42

frobnitzem opened this issue Jul 24, 2023 · 0 comments

Comments

@frobnitzem
Copy link

In my HPC environment, srun accomplishes pinning of MPI ranks to specific cores and GPU-s (by setting ROCR_VISIBLE_DEVICES). However, this conflicts with rccl-tests, which tries to manually select GPUs based on the MPI rank.

I have fixed this in my own build (frobnitzem@5b347ee) by always running the step gpuid = gpuid % args->localNumDevices, regardless of whether args->enable_multiranks is true or not.

I suggest adopting this change, and reverting the update: d16d1fb which throws an error in this case instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant