[MoE][PoC] Expert Parallel: tp and tp2ep #731

tianyu-l · 2024-12-12T00:44:56Z

Stack from ghstack (oldest at bottom):

Issues (12/11/2024)

forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better
torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks

Not including

softmax scoring when Router Parallel is used (currently only sigmoid)

[ghstack-poisoned]

ghstack-source-id: 5e173f32cc2162feb063f58d9c85ad821c325770 Pull Request resolved: #731

Issues (12/11/2024) - forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better - torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks Haven't worked on - softmax scoring when Router Parallel is used (currently only sigmoid) [ghstack-poisoned]

[MoE][PoC] Expert Parallel: tp and tp2ep

50faa5a

[ghstack-poisoned]

tianyu-l mentioned this pull request Dec 12, 2024

[MoE][PoC] model code #730

Draft

tianyu-l added a commit that referenced this pull request Dec 12, 2024

[MoE][PoC] Expert Parallel: tp and tp2ep

0c258d0

ghstack-source-id: 5e173f32cc2162feb063f58d9c85ad821c325770 Pull Request resolved: #731

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 12, 2024

tianyu-l mentioned this pull request Dec 12, 2024

[MoE][PoC] Expert Parallel: dp2ep #732

Draft

tianyu-l marked this pull request as draft December 12, 2024 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MoE][PoC] Expert Parallel: tp and tp2ep #731

[MoE][PoC] Expert Parallel: tp and tp2ep #731

tianyu-l commented Dec 12, 2024 •

edited

Loading

[MoE][PoC] Expert Parallel: tp and tp2ep #731

Are you sure you want to change the base?

[MoE][PoC] Expert Parallel: tp and tp2ep #731

Conversation

tianyu-l commented Dec 12, 2024 • edited Loading

tianyu-l commented Dec 12, 2024 •

edited

Loading