Are there any plans to support context parallel? #2141

dz1iang · 2024-12-10T09:11:03Z

Long-text scenarios are quite common, and it would be of great help if they could be supported.

joecummings · 2024-12-10T11:01:42Z

Take a look at @felipemello1's awesome RFC looking at how we would plan to support even longer context models: #1244.

We started taking a look at it, but then de-prioritized the work in favor of onboarding new modalities b/c we found that with our memory optimizations we could easily get to 64K context length.

What use case are you trying to work on? We could revisit our prioritization.

dz1iang · 2024-12-11T02:53:44Z

Take a look at @felipemello1's awesome RFC looking at how we would plan to support even longer context models: #1244.

We started taking a look at it, but then de-prioritized the work in favor of onboarding new modalities b/c we found that with our memory optimizations we could easily get to 64K context length.

What use case are you trying to work on? We could revisit our prioritization.

In my opinion, everyone is chasing after O1. In this process, the training on long texts in SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) is necessary.

xs1997zju · 2024-12-25T11:51:29Z

Take a look at @felipemello1's awesome RFC looking at how we would plan to support even longer context models: #1244.

We started taking a look at it, but then de-prioritized the work in favor of onboarding new modalities b/c we found that with our memory optimizations we could easily get to 64K context length.

What use case are you trying to work on? We could revisit our prioritization.

@joecummings Does it support 128k of 70b size model full training? I mean, not lora, but full parameter

xs1997zju · 2024-12-25T11:53:15Z

Take a look at @felipemello1's awesome RFC looking at how we would plan to support even longer context models: #1244.
We started taking a look at it, but then de-prioritized the work in favor of onboarding new modalities b/c we found that with our memory optimizations we could easily get to 64K context length.
What use case are you trying to work on? We could revisit our prioritization.

In my opinion, everyone is chasing after O1. In this process, the training on long texts in SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) is necessary.
@dz1iang
I really can’t agree more！

felipemello1 · 2024-12-25T15:15:21Z

thanks for the feedback! It seems that multiple users want it. We have plans to support tensor parallelism, which should help with that. But once we incorporate more types with parallelism with device mesh, it shouldnt be too hard to expand it to context parallel.

TLDR: It is a feature that we may have by H1, but unfortunately, not in the next couple of weeks.

joecummings added the enhancement New feature or request label Dec 10, 2024

joecummings assigned felipemello1 Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any plans to support context parallel? #2141

Are there any plans to support context parallel? #2141

dz1iang commented Dec 10, 2024

joecummings commented Dec 10, 2024

dz1iang commented Dec 11, 2024

xs1997zju commented Dec 25, 2024

xs1997zju commented Dec 25, 2024 •

edited

Loading

felipemello1 commented Dec 25, 2024 •

edited

Loading

Are there any plans to support context parallel? #2141

Are there any plans to support context parallel? #2141

Comments

dz1iang commented Dec 10, 2024

joecummings commented Dec 10, 2024

dz1iang commented Dec 11, 2024

xs1997zju commented Dec 25, 2024

xs1997zju commented Dec 25, 2024 • edited Loading

felipemello1 commented Dec 25, 2024 • edited Loading

xs1997zju commented Dec 25, 2024 •

edited

Loading

felipemello1 commented Dec 25, 2024 •

edited

Loading