-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrefixLM is loaded as CausalLM after HuggingFace export #739
Comments
Hey @timsteuer, just to confirm. If you specify |
Yes, after |
Got it, unfortunately there isn't anything we can do about this. When you do |
Oh, I see. So from your side, it might be a good idea to document that somewhere very prominently. Also, do you think it would be worthwhile to raise an issue / pull request @ HuggingFace? If I get it correctly, they just have to check the config for the model's objective to see if they can load their implementation or have to use your implementation with |
Yeah, docs are a good idea. We have an explicit error if you try to use MPT with As for an issue/PR on Hugging Face, sounds reasonable to me! I'm not sure if they will want to do this because there are actually quite a few things that our implementation supports that theirs does not, but no harm in asking. |
Hm, documentation may be helpful at the following two places:
|
Thanks for the suggestions, will do! |
When loading a prefix-lm model trained with llm-foundry into HuggingFace, one is tempted to do an
AutoModelForCausalLM.from_pretrained(<snapshot>)
.However, this loads the model not as a prefix-lm but as a causal LM (what I learnt the hard way).
In consequence, the predictions of the model are not exactly random, but rather suboptimal given its training state.
I consider this a bug, because it lets the user falsely believe that the model is loaded correctly. It is also kind of sneaky, as only if we compare the model predictions after loading with the ones from e.g. the training, we can see that something is not right.
The expected behavior would be to not allow a loading of a prefix-lm model as a causal LM with a pure left-to-right mask.
Environment
llm-foundry:main
)To reproduce
Steps to reproduce the behavior:
AutoModelForCausalLM.from_pretrained()
Model will be loaded in the wrong state.
Expected behavior
Model will not be loaded at all. An error message reminds me to turn on
trust_remote_code
The text was updated successfully, but these errors were encountered: