Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pip-installable, simple implementation of MeZO (along with a distributed impl. and some tests) #26

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

lebrice
Copy link

@lebrice lebrice commented Dec 20, 2023

Hello there!

I was very interested in your work after seeing it a NeurIPS. I'd like to play around with it a bit in the future. In order to do so, I felt it might be useful to add a simple, standalone implementation of your algorithm to your codebase, so people can more easily import it into their codebase and use it.

Here are my contributions, if you're interested:

  • Add a simple, readable, standalone implementation of the MeZO update in a new mezo package
    • mezo.update: Perform a single MeZO update given the model, loss function, inputs, random seed, epsilon and learning rate.
      • NOTE: This also implements a minor improvement w.r.t. to the original algorithm: We can split up the update into smaller chunks whenever a weight matrix is too large. This makes it so the maximum additional VRAM used during a mezo update can be selected apriori, instead of being the size of the largest weight (e.g. the embedding matrix in LLMs).
    • mezo.reconstruct_updates: Reconstructs a sequence of MeZO updates given the model, random seeds and projected gradients of each step
    • mezo.average_of_updates: Performs the average of multiple MeZO updates given the model, random seeds and projected gradients of each step or worker
    • mezo.distributed_update: Distributed update, each worker communicates the projected grads (and random seed implicitly) to all other workers. Each worker ends up reconstructing the average update from all workers.
  • Add a distributed MeZO update in mezo.distributed
  • Make this installation pip-installable and add small install instructions in the README.md
  • Add unit tests for every added major function (mezo.update, mezo.reconstruct_updates, mezo.average_of_updates, mezo.distributed_update)

If you'd like to add these changes to your repo, could you please just make sure that I didn't miss anything in my re-implementation of the algorithm (perhaps by reading through the mezo.update and mezo.distributed_update functions, if possible).

Thanks and congratulations on this great work!



```bash
pip install git+https://www.github.com/lebrice/MeZO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If you're interested in merging this PR, then simply accept this change so people pip-install your repo instead of my fork.

Suggested change
pip install git+https://www.github.com/lebrice/MeZO
pip install git+https://www.github.com/princeton-nlp/MeZO

Signed-off-by: Fabrice Normandin <[email protected]>
@lebrice
Copy link
Author

lebrice commented Mar 1, 2024

Hey @gaotianyu1350 @sadhikamalladi @eltociear @danqi, would you be interested in reviewing this contribution to your repo?

@Alignment-Lab-AI
Copy link

@lebrice appreciate the effort youve gone through, this is adding productively to some experiments were o at the moment!

@lebrice
Copy link
Author

lebrice commented Jun 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants