Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters #35898

lenglaender · 2025-01-26T20:15:39Z

What does this PR do?

This PR replaces in-place operations in the Deberta and Deberta-V2 implementations. This fixes gradient checkpointing for Deberta and Deberta-V2 when using the Adapters library or Hugging Face PEFT.

Before this PR, when using model.enable_input_require_grads() on a PEFT / Adapters model, we get the following error: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation (see adapter-hub/adapters#759). To reproduce for PEFT, run the following script:

from transformers import DebertaConfig, DebertaForSequenceClassification
from peft import get_peft_model, LoraConfig, TaskType
import torch

# Create a minimal DeBERTa config for testing
config = DebertaConfig(
    hidden_size=32,
    num_hidden_layers=5,
    num_attention_heads=4,
    intermediate_size=37,
    relative_attention=True,
)

# PEFT model
model = DebertaForSequenceClassification(config)
peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, peft_config)
model.train()

# Enable input gradients
model.enable_input_require_grads()

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Create a random input tensor for the forward pass
batch_size = 2
seq_length = 10
input_ids = torch.ones((batch_size, seq_length), dtype=torch.long).to(device)
attention_mask = torch.ones_like(input_ids).to(device)

# Without this PR, this throws a RuntimeError
outputs = model(input_ids=input_ids, attention_mask=attention_mask)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Maybe one of @Rocketknight1 @SunMarc @ArthurZucker

SunMarc

Make sense, LGTM !

Replace In-Place Operations for Deberta and Deberta-V2

95287a6

lenglaender mentioned this pull request Jan 26, 2025

Add Support for Gradient Checkpointing adapter-hub/adapters#759

Merged

SunMarc approved these changes Jan 27, 2025

View reviewed changes

SunMarc requested a review from ArthurZucker January 27, 2025 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters #35898

Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters #35898

lenglaender commented Jan 26, 2025

SunMarc left a comment

Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters #35898

Are you sure you want to change the base?

Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters #35898

Conversation

lenglaender commented Jan 26, 2025

What does this PR do?

Before submitting

Who can review?

SunMarc left a comment

Choose a reason for hiding this comment