Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight tying with FSDP #84

Open
xinghaow99 opened this issue Feb 28, 2025 · 0 comments
Open

Weight tying with FSDP #84

xinghaow99 opened this issue Feb 28, 2025 · 0 comments

Comments

@xinghaow99
Copy link

xinghaow99 commented Feb 28, 2025

Hi, this issue is closely related to the previous issue(#74) about embedding tying. I'm trying to train a model with weight tying, for example, a model shares feed_forward.w1 weight across all layers.

I tried assign the weights directly, like layers.1.feed_forward.w1.weight = layers.0.feed_forward.w1.weight but it will run into the problem with saving/loading as previously mentioned. I also tried using the TiedLinear method, but it seems to mess up with FSDP. What is the appropriate way to do this?

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant