Skip to content

Commit

Permalink
fix: add legacy attributes to LLama attention
Browse files Browse the repository at this point in the history
The parallelization code expects these parameters to be set. A proper
fix would be to write a specific Llama parallel model.
  • Loading branch information
dacorvo committed Jan 29, 2025
1 parent 292fbea commit 565639f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions optimum/neuron/distributed/decoder_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,10 @@ def _parallelize(
layers = model.model.layers

for layer in layers:
# FIXME: temporary workaround to avoid too many changes in the transformation code
layer.self_attn.num_heads = layer.self_attn.config.num_attention_heads
layer.self_attn.num_key_value_heads = layer.self_attn.config.num_key_value_heads
layer.self_attn.hidden_size = layer.self_attn.config.hidden_size
layer.self_attn = LlamaParallelSelfAttention.transform(
model,
layer.self_attn,
Expand Down

0 comments on commit 565639f

Please sign in to comment.