-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there any plans to support context parallel? #2141
Comments
Take a look at @felipemello1's awesome RFC looking at how we would plan to support even longer context models: #1244. We started taking a look at it, but then de-prioritized the work in favor of onboarding new modalities b/c we found that with our memory optimizations we could easily get to 64K context length. What use case are you trying to work on? We could revisit our prioritization. |
In my opinion, everyone is chasing after O1. In this process, the training on long texts in SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) is necessary. |
@joecummings Does it support 128k of 70b size model full training? I mean, not lora, but full parameter |
|
thanks for the feedback! It seems that multiple users want it. We have plans to support tensor parallelism, which should help with that. But once we incorporate more types with parallelism with device mesh, it shouldnt be too hard to expand it to context parallel. TLDR: It is a feature that we may have by H1, but unfortunately, not in the next couple of weeks. |
Long-text scenarios are quite common, and it would be of great help if they could be supported.
The text was updated successfully, but these errors were encountered: