Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code for projecting pre-trained BERT weights into Monarch matrices #3

Open
sinamps opened this issue Oct 3, 2023 · 2 comments
Open

Comments

@sinamps
Copy link

sinamps commented Oct 3, 2023

Hello, I would like to know if you have published the code to project the pre-trained weights of the BERT model into Monarch matrices. I cannot locate the code for this (I have also looked in the fly repo).
I can see the projection functions here, but I am interested in knowing how you use them specifically for BERT (or other transformers for NLP) to go from pre-trained weights to Monarch matrices. Thank you very much.

@DanFu09
Copy link
Collaborator

DanFu09 commented Oct 3, 2023

Ah, we don't actually use those in our work - that file was just copy-pasted from the fly repo. In M2 we're training everything from scratch, since the gated convolutional layers are quite different in function from an attention layer. It would be interesting to figure out how to distill an attention layer into a gated convolution!

@sinamps
Copy link
Author

sinamps commented Oct 3, 2023

Thank you for your prompt response @DanFu09. Would you happen to have any pointers on how that was done in the fly work? I am already working with those projection functions from the fly repo, but I want to make sure I correctly reproduce the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants