Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于nvswitch节点存在fwd compute行为的问题 #77

Open
battlar opened this issue Jan 16, 2025 · 2 comments
Open

关于nvswitch节点存在fwd compute行为的问题 #77

battlar opened this issue Jan 16, 2025 · 2 comments

Comments

@battlar
Copy link

battlar commented Jan 16, 2025

在默认样例中有128个gpu节点,但在模拟过程中发现有144个节点存在计算行为,即除了gpu节点外,16个nvswitch节点也调用了get_fwd_pass_compute()函数,请问这是什么原因呢?

@HUNMrsen
Copy link
Collaborator

HUNMrsen commented Jan 17, 2025

在SimAI中nvswitch节点也被视作具备计算功能的节点参与计算,所以这里也参与整体的调度逻辑,调度了get_fwd_pass_compute(),例如当采用NVLS TREE算法时,nvswitch需要参与集合通信的计算。

@battlar
Copy link
Author

battlar commented Jan 17, 2025

在SimAI中nvswitch节点也被视作具备计算功能的节点参与计算,所以这里也参与整体的调度逻辑,调度了get_fwd_pass_compute(),例如当采用NVLS TREE算法时,nvswitch需要参与集合通信的计算。

明白了,感谢解答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants