Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix llama conversion, improve parameter conversion #94

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Dec 17, 2024

✨ Description

Fixes: #93

#55 added support for rope scaling, but requires the value to be a dict. However, the value can also be None, in which case #55 broke conversion.

This could not be solved in an easy, non-hacky way, because there is a variable set of entries on the HF side, which may depend on each other. I handled this by making a converter for the entire rope scaling dict on the HF side, but this has to be mapped to multiple values on the Fast-LLM side, which is also a problem. (Mapping to the whole rotary config dict wouldn't work because of rope theta.)

So the solution is to adapt parameter converters to take a variable number of parameters on each side. The rope scaling if the first use case for this, but I expect more in the future. Also it's the same thing we do for weight converters (and for exactly the same reason), so it's a natural next step. That makes converter definitions a bit more complicated since I had to enforce 2d arrays of tuples to make things less confusing and error-prone, but I think it's worth it in the long run.

Added the DEFAULT tag that gets replaced with the default value during config validation, so far it's used to avoid the ad-hoc None handling in rope config, but I'm sure it will have more uses elsewhere.

Added a plain llama model in the tests, it's the same as mistral but uses the llama converter.

Note that it will force some small changes in unmerged converters (only #5 and #84)

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier marked this pull request as ready for review December 17, 2024 05:48
@jlamypoirier jlamypoirier changed the title Fix improve conversion Fix llama conversion, improve parameter conversion Dec 17, 2024
@jlamypoirier jlamypoirier merged commit d8f3390 into main Dec 17, 2024
4 checks passed
@jlamypoirier jlamypoirier deleted the fix_improve_conversion branch December 17, 2024 21:11
@jlamypoirier jlamypoirier mentioned this pull request Dec 17, 2024
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Crash when loading pretrained model
1 participant