Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will veRL support deepspeed? #221

Open
albertcity opened this issue Feb 7, 2025 · 3 comments
Open

Will veRL support deepspeed? #221

albertcity opened this issue Feb 7, 2025 · 3 comments

Comments

@albertcity
Copy link

albertcity commented Feb 7, 2025

It is said in the paper that

Our implementation supports Megatron-LM, PyTorch FSDP, and DeepSpeed as the LLM training and inference engines, and vLLM for autoregressive generation.

However, I can only find support for FSDP and Megatron-LM in the current version. Is there any plan to support Deepspeed in the near future?
I think Deepspeed has some advantages over FSDP and is more feasible for large-scale training, and its advantages are also orthogonal to that of Megatron-LM. Therefore we may achieve higher speedup if we can support Deepspeed.

@albertcity albertcity changed the title Do veRL support deepspeed? Will veRL support deepspeed? Feb 7, 2025
@eric-haibin-lin
Copy link
Collaborator

in the short term - no. we have limited staff maintaining the repo. That being said, we always welcome contribution from the community.
From my experience sometime deepspeed zero3 uses more memory and is slower than FSDP zero3 for long context models, which is important for reasoning & RL. We're more interested in integrating with torchtitan as it provides a combination of different parallelism strategies under torch native APIs

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Feb 8, 2025

Thanks for the question!

We had a deepspeed backend one year ago but deprecate it as not enough man power to maintain it.

Also, we found that torch FSDP is comparable to (or even better) DeepSpeed. It can support training up to 70B models with high MFU

@AIRobotZhang
Copy link

in the short term - no. we have limited staff maintaining the repo. That being said, we always welcome contribution from the community. From my experience sometime deepspeed zero3 uses more memory and is slower than FSDP zero3 for long context models, which is important for reasoning & RL. We're more interested in integrating with torchtitan as it provides a combination of different parallelism strategies under torch native APIs

How to set FSDP zero3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants