Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MoE][PoC] Expert Parallel: tp and tp2ep #731

Draft
wants to merge 2 commits into
base: gh/tianyu-l/25/base
Choose a base branch
from

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Dec 12, 2024

Stack from ghstack (oldest at bottom):

Issues (12/11/2024)

  • forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better
  • torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks

Not including

  • softmax scoring when Router Parallel is used (currently only sigmoid)

@tianyu-l tianyu-l mentioned this pull request Dec 12, 2024
tianyu-l added a commit that referenced this pull request Dec 12, 2024
ghstack-source-id: 5e173f32cc2162feb063f58d9c85ad821c325770
Pull Request resolved: #731
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 12, 2024
Issues (12/11/2024)
- forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better
- torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks

Haven't worked on
- softmax scoring when Router Parallel is used (currently only sigmoid)

[ghstack-poisoned]
@tianyu-l tianyu-l marked this pull request as draft December 12, 2024 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants