README: ensure modeling code is patched before model instantiation #170

tmm1 · 2024-08-29T21:17:07Z

Summary

Fixes example in README to make it functional

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

ByronHsu

Before or after both work because it monkey patches the HF module directly. It is more straightforward to apply afterwards.

tmm1 · 2024-08-29T23:26:24Z

Unfortunately you are mistaken.

import torch
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import liger_kernel.transformers

model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
config = AutoConfig.from_pretrained(
    model_name,
    attn_implementation='flash_attention_2',
    use_cache=False,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    config=config,
)

liger_kernel.transformers.apply_liger_kernel_to_llama()

print(model.model.norm.__class__)
print(model.model.norm.__class__ == liger_kernel.transformers.rms_norm.LigerRMSNorm)
print("\n")


model2 = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    config=config,
)
print(model2.model.norm.__class__)
print(model2.model.norm.__class__ == liger_kernel.transformers.rms_norm.LigerRMSNorm)

<class 'transformers.models.llama.modeling_llama.LlamaRMSNorm'>
False

<class 'liger_kernel.transformers.rms_norm.LigerRMSNorm'>
True

Basically, if any code has already run which calls i.e. self.norm = LlamaRMSNorm(...), then it doesn't matter that you swapped out modeling_llama.LlamaRMSNorm afterwards because the old object was already instantiated and its old methods will be used.

ByronHsu · 2024-08-29T23:43:39Z

oh interesting! i vaguely remembered i tried this before. will take another look. apologize for the overlook

shimizust · 2024-08-30T00:10:57Z

You're right, thanks for pointing that out @tmm1 !

ByronHsu

wait, both should work fine. i think it is because even though the object is llama rms, but it is actually called to liger's

ByronHsu · 2024-08-30T00:17:27Z

let me verify more

tmm1 · 2024-08-30T00:27:42Z

you can check model2.model.norm.forward and other methods, they are not the liger versions

shimizust · 2024-08-30T01:41:26Z

@tmm1 Just to follow up, the module classes won't get patched if you apply liger post model-init. However, patching model functions (e.g. modeling_mistral.apply_rotary_pos_emb) or a class method (e.g. modeling_mistral.MistralForCausalLM.forward) do get applied, so it is incomplete patching.

Thanks again for pointing it out!

tyler-romero · 2024-08-30T03:14:41Z

Related, is this test incorrect?
https://github.com/tyler-romero/Liger-Kernel/blob/main/test/convergence/test_mini_models_no_logits.py#L324

We patch mini_llama3 in one test (parameter 0) and then we run the model as if the patch has not been applied in the second test (parameter 1). So it seems like the bfloat16 tests are comparing a patched liger model against itself.

ByronHsu · 2024-08-30T03:45:57Z

this is an essential finding!! Thanks @tmm1 for bearing with by overlook lol

ByronHsu · 2024-08-30T04:06:03Z

@tyler-romero you are right! i run only bf16 and they failed. look like the tol is too tight. we need some intelligent way to unset the patching

tmm1 · 2024-08-30T04:11:21Z

Thanks all! Happy to help. The incomplete patching issue is quite unintuitive and has tripped me up a number of times in the past.

README: ensure modeling code is patched before model instantiation

56defb9

ByronHsu reviewed Aug 29, 2024

View reviewed changes

ByronHsu closed this Aug 29, 2024

ByronHsu reopened this Aug 29, 2024

tmm1 mentioned this pull request Aug 30, 2024

Load dynamic module (remote code) only once if code isn't change huggingface/transformers#33162

Merged

5 tasks

shimizust approved these changes Aug 30, 2024

View reviewed changes

ByronHsu requested changes Aug 30, 2024

View reviewed changes

Merge branch 'main' into fix-examples

f63ac99

shimizust enabled auto-merge (squash) August 30, 2024 01:42

ByronHsu approved these changes Aug 30, 2024

View reviewed changes

Merge branch 'main' into fix-examples

743d0b9

shimizust merged commit 80d6c0f into linkedin:main Aug 30, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README: ensure modeling code is patched before model instantiation #170

README: ensure modeling code is patched before model instantiation #170

tmm1 commented Aug 29, 2024

ByronHsu left a comment

tmm1 commented Aug 29, 2024 •

edited

Loading

ByronHsu commented Aug 29, 2024

shimizust commented Aug 30, 2024

ByronHsu left a comment

ByronHsu commented Aug 30, 2024

tmm1 commented Aug 30, 2024

shimizust commented Aug 30, 2024

tyler-romero commented Aug 30, 2024

ByronHsu commented Aug 30, 2024

ByronHsu commented Aug 30, 2024

tmm1 commented Aug 30, 2024

README: ensure modeling code is patched before model instantiation #170

README: ensure modeling code is patched before model instantiation #170

Conversation

tmm1 commented Aug 29, 2024

Summary

Testing Done

ByronHsu left a comment

Choose a reason for hiding this comment

tmm1 commented Aug 29, 2024 • edited Loading

ByronHsu commented Aug 29, 2024

shimizust commented Aug 30, 2024

ByronHsu left a comment

Choose a reason for hiding this comment

ByronHsu commented Aug 30, 2024

tmm1 commented Aug 30, 2024

shimizust commented Aug 30, 2024

tyler-romero commented Aug 30, 2024

ByronHsu commented Aug 30, 2024

ByronHsu commented Aug 30, 2024

tmm1 commented Aug 30, 2024

tmm1 commented Aug 29, 2024 •

edited

Loading