Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

↔️ GRPO: Set max_model_len when initializing vLLM instance #2728

Merged
merged 16 commits into from
Feb 5, 2025

Conversation

mirceapricop
Copy link
Contributor

What does this PR do?

By default, the vLLM model will be set up to support the max context of the input model.

However, during training we know we will only observe at most max_prompt_length + max_completion_length tokens, so we can use that to have a reduced memory footprint.

This is especially relevant when running on limited hardware as the improvement in memory can be significant.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@mirceapricop mirceapricop changed the title Set max_model_len when initializing vLLM instance GRPO: Set max_model_len when initializing vLLM instance Feb 1, 2025
@qgallouedec
Copy link
Member

We could do that, but memory limitation shouldn't come from generation. Or maybe are you using the same device from training and generation?

@mirceapricop
Copy link
Contributor Author

Indeed, my use case is specifically running on a single consumer GPU. It might be wishful thinking, but with this patch I am able to run a training loop for a 1.5B model.

@qgallouedec
Copy link
Member

Have you tried to reduce the vllm gpu memory usage?

@mirceapricop
Copy link
Contributor Author

Yes, in fact that's what led me to this change. Lowering it reduces the space left for a KV cache, and vLLM prints:

The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.

So my understanding is that with this, a smaller KV cache is more efficiently utilized.

Are there any downsides to setting this? We could make it opt-in through an arg if you think it can have negative implications.

@qgallouedec
Copy link
Member

Now it makes sense. Can you add an arg in the config instead?

@qgallouedec
Copy link
Member

vllm_max_model_len

@mirceapricop
Copy link
Contributor Author

Works for me, done.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qunash
Copy link

qunash commented Feb 2, 2025

I think it would be cleaner to add a vllm_init_kwargs dict that would be passed directly to the vLLM engine initialization. It would replace the individual vllm_device and vllm_gpu_memory_utilization parameters with a more flexible approach that:

  1. Makes it easier to support all vLLM engine parameters without adding new fields to the config
  2. Follows the same pattern as model_init_kwargs that's already in the codebase
vllm_init_kwargs: Optional[dict] = field(
        default_factory=lambda: {
            "device": "auto",
            "gpu_memory_utilization": 0.9,
        },
        metadata={
        },
    )

@mirceapricop
Copy link
Contributor Author

That sounds good to me, but I'm not sure if it can be done in a backwards compatible manner (or if it's acceptable to make backwards-incompatible flag changes). @qgallouedec thoughts?

@sroecker
Copy link

sroecker commented Feb 2, 2025

I think it would be cleaner to add a vllm_init_kwargs dict that would be passed directly to the vLLM engine initialization.

This would be the best solution for single GPU (poor) training as people might want to tune other parameters as well as seen in this thread: https://x.com/robertshaw21/status/1885781591961571455

@mirceapricop
Copy link
Contributor Author

Now updated with vllm_init_kwargs and marking previous vllm init args as deprecated (though still using them to stay backward compatible)

@mirceapricop mirceapricop changed the title GRPO: Set max_model_len when initializing vLLM instance GRPO: Expose vllm_init_kwargs to enable vllm configuration Feb 2, 2025
@mirceapricop
Copy link
Contributor Author

@qgallouedec I'd recommend merging this before other PRs that change the vllm init call, as it most likely covers all their needs (also merging is hard)

@qgallouedec
Copy link
Member

In fact, I'm in favor of explicitly stating the parameters for two main reasons:

  • not doing so means implicitly assuming that any combination of parameters is compatible with GRPO, which is not the case
  • although less flexible, it's easier for the user to know which arguments are available, without having to refer to the vLLM doc

As for backwards compatibility, GRPO is a new trainer and the lib is still in alpha, so there's no real need to ensure that.

This can be discussed again in the future if this lack of flexibility is really a problem. But adding parameters one-by-one should be good for now

@mirceapricop mirceapricop changed the title GRPO: Expose vllm_init_kwargs to enable vllm configuration GRPO: Set max_model_len when initializing vLLM instance Feb 4, 2025
@mirceapricop
Copy link
Contributor Author

Now reverted to just adding a single new arg

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, merging when CI is green :)

@qgallouedec qgallouedec changed the title GRPO: Set max_model_len when initializing vLLM instance ↔️ GRPO: Set max_model_len when initializing vLLM instance Feb 5, 2025
@qgallouedec qgallouedec merged commit 78c5ce2 into huggingface:main Feb 5, 2025
13 checks passed
@mirceapricop mirceapricop deleted the patch-1 branch February 5, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants