Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPO] feat: Add LoRA support for PPO #205

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

StephenXie
Copy link

@StephenXie StephenXie commented Feb 5, 2025

This PR adds LoRA (Low-Rank Adaptation) support for PPO (#159)

Changes

Features

  • Configurable LoRA rank and alpha parameters
  • Target module specification for selective adaptation
  • Compatible with FSDP sharding strategy

Some known issues:

  • Merge Ref and Actor when LoRA is on requires modifying ppo_trainer logic, we need some help
  • No thorough testing yet
  • Line 80 of fsdp_vllm.py needs to be cleaned up
    params = OrderedDict((k.replace(".base_layer.", "."), v) for k, v in params.items() if not ".lora_" in k)

@StephenXie StephenXie marked this pull request as draft February 5, 2025 08:01
@Jiayi-Pan
Copy link
Contributor

Jiayi-Pan commented Feb 5, 2025

Relevant thread

#159

Jiayi-Pan/TinyZero#15


if isinstance(self.module._fsdp_wrapped_module, PeftModel):
# the model to sync weights to is a vLLM model (not a peft model), so we need to merge the adapters
with FSDP.summon_full_params(self.module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summon full params may cause OOM. @PeterSH6 Is there a better approach, that can merge lora weights in sharded form, or at least, one parameter after another to support large models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants