Will veRL support deepspeed? #221

albertcity · 2025-02-07T08:16:25Z

It is said in the paper that

Our implementation supports Megatron-LM, PyTorch FSDP, and DeepSpeed as the LLM training and inference engines, and vLLM for autoregressive generation.

However, I can only find support for FSDP and Megatron-LM in the current version. Is there any plan to support Deepspeed in the near future?
I think Deepspeed has some advantages over FSDP and is more feasible for large-scale training, and its advantages are also orthogonal to that of Megatron-LM. Therefore we may achieve higher speedup if we can support Deepspeed.

The text was updated successfully, but these errors were encountered:

eric-haibin-lin · 2025-02-07T19:29:24Z

in the short term - no. we have limited staff maintaining the repo. That being said, we always welcome contribution from the community.
From my experience sometime deepspeed zero3 uses more memory and is slower than FSDP zero3 for long context models, which is important for reasoning & RL. We're more interested in integrating with torchtitan as it provides a combination of different parallelism strategies under torch native APIs

PeterSH6 · 2025-02-08T02:55:23Z

Thanks for the question!

We had a deepspeed backend one year ago but deprecate it as not enough man power to maintain it.

Also, we found that torch FSDP is comparable to (or even better) DeepSpeed. It can support training up to 70B models with high MFU

AIRobotZhang · 2025-02-12T07:00:30Z

in the short term - no. we have limited staff maintaining the repo. That being said, we always welcome contribution from the community. From my experience sometime deepspeed zero3 uses more memory and is slower than FSDP zero3 for long context models, which is important for reasoning & RL. We're more interested in integrating with torchtitan as it provides a combination of different parallelism strategies under torch native APIs

How to set FSDP zero3?

albertcity changed the title ~~Do veRL support deepspeed?~~ Will veRL support deepspeed? Feb 7, 2025

AIRobotZhang mentioned this issue Feb 13, 2025

FSDP model Parallel #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will veRL support deepspeed? #221

Will veRL support deepspeed? #221

albertcity commented Feb 7, 2025 •

edited

Loading

eric-haibin-lin commented Feb 7, 2025

PeterSH6 commented Feb 8, 2025

AIRobotZhang commented Feb 12, 2025

Will veRL support deepspeed? #221

Will veRL support deepspeed? #221

Comments

albertcity commented Feb 7, 2025 • edited Loading

eric-haibin-lin commented Feb 7, 2025

PeterSH6 commented Feb 8, 2025

AIRobotZhang commented Feb 12, 2025

albertcity commented Feb 7, 2025 •

edited

Loading