Enable P2P on pcie in a nvlink machine #250

cll24 · 2024-09-09T12:18:33Z

Hi, I want to test the all_reduce_perf with p2p through PCIe in H20. However, H20 is equipped with nvlink, the NCCL all_reduce_perf always transfers data with the nvlink. How Can I get the p2p with PCIe and disable the nvlink in the test.

I tried to disable the nvlink with RMNvLinkEnable=0x0. Then the NCCL all_reduce_perf will always leverage the SHM to communicate.

The text was updated successfully, but these errors were encountered:

kiskra-nvidia · 2024-09-11T03:28:42Z

To the best of my knowledge, there's no way for NCCL to disable just nvlink. The granularity of control is "P2P" or "no P2P".

What does nvidia-smi topo -m print after you use RMNvLinkEnable? Perhaps the GPUs are simply too far from each other on the PCIe bus? NCCL will typically not attempt P2P if devices are any further from each other than PXB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable P2P on pcie in a nvlink machine #250

Enable P2P on pcie in a nvlink machine #250

cll24 commented Sep 9, 2024

kiskra-nvidia commented Sep 11, 2024

Enable P2P on pcie in a nvlink machine #250

Enable P2P on pcie in a nvlink machine #250

Comments

cll24 commented Sep 9, 2024

kiskra-nvidia commented Sep 11, 2024