Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

codereba · 2025-01-23T15:41:20Z

The lines are called multiple times when export llama 3_2_3b:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L302

Both modeling_llama.LlamaRotaryEmbedding.forward and modeling_llama.LlamaRotaryEmbedding._original_forward will be set to bypass_RotaryEmbedding, and bypass_RotaryEmbedding just return position_ids, that's not correct values (its shape and its contained data are incorrect), please refer to:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L125-L137

This line will raise exception when prepare the inference jobs:
emb_size = embeddings[0].size(-1) // 2

So, this patch adds the condition to ensure
modeling_llama.LlamaRotaryEmbedding._original_forward only be set to modeling_llama.LlamaRotaryEmbedding.forward once.

Local tests are passed:
The step of inference jobs works correctly,

…anged to an invalid method. The lines are called multiple times when export llama 3_2_3b: https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L302 Both modeling_llama.LlamaRotaryEmbedding.forward and modeling_llama.LlamaRotaryEmbedding._original_forward will be set to bypass_RotaryEmbedding, but bypass_RotaryEmbedding just returns position_ids, that's not correct values (both its shape and its data are incorrect), please refer to: https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L125-L137 This line will raise the 'IndexError' exception when do the inference jobs: emb_size = embeddings[0].size(-1) // 2 So, this patch adds the condition to ensure modeling_llama.LlamaRotaryEmbedding._original_forward only be set to modeling_llama.LlamaRotaryEmbedding.forward once. Local tests are passed: The step of inference jobs works correctly. Signed-off-by: balancesoft <[email protected]>

shreyajn · 2025-01-24T21:29:01Z

I am confirming the issue now. If the fix is needed, I'll add it to our repo and make it available in the next release.

codereba · 2025-01-24T23:51:36Z

I am confirming the issue now. If the fix is needed, I'll add it to our repo and make it available in the next release.

I want to discuss this issue and the solution for the better solution, I think the reason is there exporting 2 instances of llama 3.2.3b:

        AR-128: Used to process prompts.
        AR-1: Used to process response.

The 2 instances will be 2 Llama3_2_Quantized instances created by the class method of Llama3_2_Quantized from_pretrained:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/llama_v3_2_3b_chat_quantized/model.py#L50

That will call init of Llama3Base_Quantized (the base class of Llama3_2_Quantized)
class Llama3Base_Quantized(Llama_QuantizedMixin, ABC):

    def __init__(
        self,
        huggingface_model_name: str,
        min_memory_recommended: int,
        aimet_encodings: str,
        sequence_length: int,
        context_length: int,
        load_pretrained: bool = True,
        _make_small_for_debugging: bool = False,  # construct a small and incorrect network
    )

https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L316-L326

And then run the lines to set the _original_forward variable:

    # Bypass rotary_emb module
    modeling_llama.LlamaRotaryEmbedding._original_forward = (
        modeling_llama.LlamaRotaryEmbedding.forward
    )
    modeling_llama.LlamaRotaryEmbedding.forward = bypass_RotaryEmbedding

https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L304

But modeling_llama.LlamaRotaryEmbedding._original_forward is a global variable, and the lines

    modeling_llama.LlamaRotaryEmbedding._original_forward = (
        modeling_llama.LlamaRotaryEmbedding.forward
    )

run 2 times for initializing 2 instances. at second initialization the modeling_llama.LlamaRotaryEmbedding._original_forward will set to bypass_RotaryEmbedding.

If that's needed for setting modeling_llama.LlamaRotaryEmbedding.forward to bypass_RotaryEmbedding, I think there may be 2 solutions:

Use the condition statement to ensure modeling_llama.LlamaRotaryEmbedding._original_forward only set once.
Define a member variable of instance of Llama3Base_Quantized to remember modeling_llama.LlamaRotaryEmbedding.forward.

else I think the lines for changing modeling_llama.LlamaRotaryEmbedding._original_forward and modeling_llama.LlamaRotaryEmbedding.forward may be removed, and you can add the lines when it's needed.

This patch may be not the best solution, you can discuss with me to enhance it together, just let leaners of ai-hub develop their app easily at first step.

Thanks! @shreyajn

codereba mentioned this pull request Jan 21, 2025

[BUG] IndexError happened when make the inference jobs for ai-hub when exporting llama_v3_2_3b_chat, and redo all the completed steps if an error is generated at the last step. #154

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

codereba commented Jan 23, 2025

shreyajn commented Jan 24, 2025

codereba commented Jan 24, 2025 •

edited

Loading

Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

Are you sure you want to change the base?

Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

Conversation

codereba commented Jan 23, 2025

shreyajn commented Jan 24, 2025

codereba commented Jan 24, 2025 • edited Loading

codereba commented Jan 24, 2025 •

edited

Loading