-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for phi4 #764
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@jlonge4 thank you very much for this pull-request: adding support for phi4 would be awesome. We are however heavily refactoring the export mechanism to remove the dependency to Can you take a look at that pull-request and see if it would make it easier for you to add support for phi4 based on the new HLO backend ? |
Hi there @dacorvo , just took a look at the difference and it certainly seems a lot slimmer! I think my effort would be the same in regard to the most important part for this which is the |
@jlonge4 the pull-request has been merged. Please let me know if you need any help rebasing your branch. |
@dacorvo Hopefully the last commit is pretty close |
@jlonge4 I rebased and squashed your branch into a new phi4 branch, then did a few tests. The I think the most efficient is that you reset your branch using mine locally (assuming here you have an "upstream" repo pointing to the main
Then you can resume the work on |
Just for my knowledge, what all specifically had to be done further to the load weights func? Any specific tests you'd like added from here? |
Main changes:
|
@jlonge4 I think it would be good to add a phi3 config to the generation tests here: optimum-neuron/tests/decoder/conftest.py Line 26 in c1cf0f0
There are not many small models available, but you can use |
@dacorvo got it! Btw so that we can effectively handle both, do you think we should add a check for |
I think it is covered by the calculation I used (stolen from |
@jlonge4 I realize you already added a As a sanity test I checked the results of Here is the CUDA result: And the neuron one: So now I am wondering if we did not miss something in the modeling code that differs from llama. |
@dacorvo hmm, so strange. I was able to make NXDI logit matching results:
|
@jlonge4 I did a lot of tests comapring with GPU versions and:
|
This PR adds support for Meta's Phi-4 model by adapting the existing LLaMA implementation.
The Phi-4 architecture follows the LLaMA architecture closely, with the main difference being in how the weights are stored (fused
qkv_proj
andgate_up
vs separate projections).