-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how overall throughout calculate about all2all #267
Comments
All numbers look good. Alltoall cannot aggregate the bandwidth of multiple NICs, so the performance you should see is the performance of a single NIC. |
so how should I calculate the single nic performance?or could you recommend me a link to understand the calculation formula? |
i have some questions here, why the busbw can exceed 50GB/s? it's also related to sharp or something else? |
CX-7 NICs at NDR rate (400Gbps) should achieve up to 48-49GB/s per NIC. NVLink SHARP is only relevant to where we offload Reduction/Multicast operations to the NVSwitches such as for NCCL AllReduce. |
OK, got it. Thanks for the detailed explanation! |
Hi,
I have six H100 nodes,and each with 8*400Gb cx7 nics. And for RDMA, I use RoCE. I want to see the overall throughout.
about allreduce, it seems that the params effect little,and the busbw is the overall throughout?
abou all2all,the params effect large,as follows:
and for all2all,the busbw is for single node or something else?How can I calculate the overall throughout?I can not understand deeply about the busbw for all2all,and what params are the best to test alltoall?the performence will down with the same config when add more node
thanks!
The text was updated successfully, but these errors were encountered: