Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable P2P on pcie in a nvlink machine #250

Open
cll24 opened this issue Sep 9, 2024 · 1 comment
Open

Enable P2P on pcie in a nvlink machine #250

cll24 opened this issue Sep 9, 2024 · 1 comment

Comments

@cll24
Copy link

cll24 commented Sep 9, 2024

Hi, I want to test the all_reduce_perf with p2p through PCIe in H20. However, H20 is equipped with nvlink, the NCCL all_reduce_perf always transfers data with the nvlink. How Can I get the p2p with PCIe and disable the nvlink in the test.

I tried to disable the nvlink with RMNvLinkEnable=0x0. Then the NCCL all_reduce_perf will always leverage the SHM to communicate.

@kiskra-nvidia
Copy link
Member

To the best of my knowledge, there's no way for NCCL to disable just nvlink. The granularity of control is "P2P" or "no P2P".

What does nvidia-smi topo -m print after you use RMNvLinkEnable? Perhaps the GPUs are simply too far from each other on the PCIe bus? NCCL will typically not attempt P2P if devices are any further from each other than PXB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants