Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of modeling_llama.LlamaRotaryEmbedding.forward being changed to an invalid method. #155

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codereba
Copy link

The lines are called multiple times when export llama 3_2_3b:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L302

Both modeling_llama.LlamaRotaryEmbedding.forward and modeling_llama.LlamaRotaryEmbedding._original_forward will be set to bypass_RotaryEmbedding, and bypass_RotaryEmbedding just return position_ids, that's not correct values (its shape and its contained data are incorrect), please refer to:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L125-L137

This line will raise exception when prepare the inference jobs:
emb_size = embeddings[0].size(-1) // 2

So, this patch adds the condition to ensure
modeling_llama.LlamaRotaryEmbedding._original_forward only be set to modeling_llama.LlamaRotaryEmbedding.forward once.

Local tests are passed:
The step of inference jobs works correctly,
image

…anged to an invalid method.

The lines are called multiple times when export llama 3_2_3b:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L302
Both modeling_llama.LlamaRotaryEmbedding.forward and modeling_llama.LlamaRotaryEmbedding._original_forward will be set to bypass_RotaryEmbedding, but bypass_RotaryEmbedding just returns position_ids, that's not correct values (both its shape and its data are incorrect), please refer to:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L125-L137

This line will raise the 'IndexError' exception when do the inference jobs:
        emb_size = embeddings[0].size(-1) // 2

So, this patch adds the condition to ensure
modeling_llama.LlamaRotaryEmbedding._original_forward only be set to
modeling_llama.LlamaRotaryEmbedding.forward once.

Local tests are passed:
The step of inference jobs works correctly.

Signed-off-by: balancesoft <[email protected]>
@shreyajn
Copy link

I am confirming the issue now. If the fix is needed, I'll add it to our repo and make it available in the next release.

@codereba
Copy link
Author

codereba commented Jan 24, 2025

I am confirming the issue now. If the fix is needed, I'll add it to our repo and make it available in the next release.

I want to discuss this issue and the solution for the better solution, I think the reason is there exporting 2 instances of llama 3.2.3b:

        AR-128: Used to process prompts.
        AR-1: Used to process response.

The 2 instances will be 2 Llama3_2_Quantized instances created by the class method of Llama3_2_Quantized from_pretrained:
https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/llama_v3_2_3b_chat_quantized/model.py#L50

That will call init of Llama3Base_Quantized (the base class of Llama3_2_Quantized)
class Llama3Base_Quantized(Llama_QuantizedMixin, ABC):

    def __init__(
        self,
        huggingface_model_name: str,
        min_memory_recommended: int,
        aimet_encodings: str,
        sequence_length: int,
        context_length: int,
        load_pretrained: bool = True,
        _make_small_for_debugging: bool = False,  # construct a small and incorrect network
    )

https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L316-L326

And then run the lines to set the _original_forward variable:

    # Bypass rotary_emb module
    modeling_llama.LlamaRotaryEmbedding._original_forward = (
        modeling_llama.LlamaRotaryEmbedding.forward
    )
    modeling_llama.LlamaRotaryEmbedding.forward = bypass_RotaryEmbedding

https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/_shared/llama3/model.py#L300-L304

But modeling_llama.LlamaRotaryEmbedding._original_forward is a global variable, and the lines

    modeling_llama.LlamaRotaryEmbedding._original_forward = (
        modeling_llama.LlamaRotaryEmbedding.forward
    )

run 2 times for initializing 2 instances. at second initialization the modeling_llama.LlamaRotaryEmbedding._original_forward will set to bypass_RotaryEmbedding.

If that's needed for setting modeling_llama.LlamaRotaryEmbedding.forward to bypass_RotaryEmbedding, I think there may be 2 solutions:

  • Use the condition statement to ensure modeling_llama.LlamaRotaryEmbedding._original_forward only set once.
  • Define a member variable of instance of Llama3Base_Quantized to remember modeling_llama.LlamaRotaryEmbedding.forward.

else I think the lines for changing modeling_llama.LlamaRotaryEmbedding._original_forward and modeling_llama.LlamaRotaryEmbedding.forward may be removed, and you can add the lines when it's needed.

This patch may be not the best solution, you can discuss with me to enhance it together, just let leaners of ai-hub develop their app easily at first step.

Thanks! @shreyajn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants