💔 Decouple loss computing and generation in GRPO #2762

qgallouedec · 2025-02-04T11:50:19Z

The motivation behind this PR is to decouple the whole part linked to the generation and calculation of rewards and ref log probs on the one hand, and to the calculation of loss on the other. It's a preparatory PR for the implementation of:

1: minibatching within a same group (reduce memory requirement)
2: the possibility of multiple optimization steps.

HuggingFaceDocBuilderDev · 2025-02-04T11:54:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-02-04T11:57:39Z

trl/trainer/grpo_trainer.py

-        attention_mask = torch.cat([prompt_mask_repeated, completion_mask], dim=1)  # (B*G, P+C)
-
-        # Get the per-token log probabilities for the completions for the model and the reference model
-        def get_per_token_logps(model, input_ids, attention_mask, logits_to_keep):


converted to method

qgallouedec · 2025-02-04T11:58:14Z

trl/trainer/grpo_trainer.py

                    )

-        # Compute the KL divergence between the model and the reference model
-        per_token_kl = torch.exp(ref_per_token_logps - per_token_logps) - (ref_per_token_logps - per_token_logps) - 1


this is later computed in compute_loss

qgallouedec · 2025-02-04T11:58:41Z

trl/trainer/grpo_trainer.py

-        # x - x.detach() allows for preserving gradients from x
-        per_token_loss = torch.exp(per_token_logps - per_token_logps.detach()) * advantages.unsqueeze(1)
-        per_token_loss = -(per_token_loss - self.beta * per_token_kl)
-        loss = ((per_token_loss * completion_mask).sum(dim=1) / completion_mask.sum(dim=1)).mean()
-
        # Log the metrics
-        completion_length = self.accelerator.gather_for_metrics(completion_mask.sum(1)).float().mean().item()
-        self._metrics["completion_length"].append(completion_length)


this is later computed in compute_loss

qgallouedec · 2025-02-04T12:00:22Z

trl/trainer/grpo_trainer.py

@@ -366,32 +366,41 @@ def _set_signature_columns_if_needed(self):
        if self._signature_columns is None:
            self._signature_columns = ["prompt"]

-    # Trainer "prepares" the inputs before calling `compute_loss`. It converts to tensor and move to device.
-    # Since we preprocess the data in `compute_loss`, we need to override this method to skip this step.
-    def _prepare_inputs(self, inputs: dict[str, Union[torch.Tensor, Any]]) -> dict[str, Union[torch.Tensor, Any]]:


in this method, we now:

generate

compute reward

compute ref log probs

lewtun

Nice refactor, LGTM!

edbeeching

LGTM, looking forward to the next PR!

decouple loss and generation

0b3d108

qgallouedec changed the title ~~Decouple loss computing and generation in GRPO~~ 💔 Decouple loss computing and generation in GRPO Feb 4, 2025

qgallouedec requested review from kashif, edbeeching, lewtun and plaguss February 4, 2025 11:56

qgallouedec commented Feb 4, 2025

View reviewed changes

lewtun approved these changes Feb 4, 2025

View reviewed changes

edbeeching approved these changes Feb 4, 2025

View reviewed changes

qgallouedec merged commit 1f344c9 into main Feb 4, 2025
13 of 14 checks passed

qgallouedec deleted the decouple-generation-and-loss branch February 4, 2025 12:21

qgallouedec mentioned this pull request Feb 5, 2025

feat: Add cliprange to GRPO loss #2739

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💔 Decouple loss computing and generation in GRPO #2762

💔 Decouple loss computing and generation in GRPO #2762

qgallouedec commented Feb 4, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 4, 2025

qgallouedec Feb 4, 2025

qgallouedec Feb 4, 2025

qgallouedec Feb 4, 2025

qgallouedec Feb 4, 2025

lewtun left a comment

edbeeching left a comment

💔 Decouple loss computing and generation in GRPO #2762

💔 Decouple loss computing and generation in GRPO #2762

Conversation

qgallouedec commented Feb 4, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Feb 4, 2025

qgallouedec Feb 4, 2025

Choose a reason for hiding this comment

qgallouedec Feb 4, 2025

Choose a reason for hiding this comment

qgallouedec Feb 4, 2025

Choose a reason for hiding this comment

qgallouedec Feb 4, 2025

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

edbeeching left a comment

Choose a reason for hiding this comment

qgallouedec commented Feb 4, 2025 •

edited

Loading