You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this issue is closely related to the previous issue(#74) about embedding tying. I'm trying to train a model with weight tying, for example, a model shares feed_forward.w1 weight across all layers.
I tried assign the weights directly, like layers.1.feed_forward.w1.weight = layers.0.feed_forward.w1.weight but it will run into the problem with saving/loading as previously mentioned. I also tried using the TiedLinear method, but it seems to mess up with FSDP. What is the appropriate way to do this?
Thanks for the help!
The text was updated successfully, but these errors were encountered:
Hi, this issue is closely related to the previous issue(#74) about embedding tying. I'm trying to train a model with weight tying, for example, a model shares
feed_forward.w1
weight across all layers.I tried assign the weights directly, like
layers.1.feed_forward.w1.weight = layers.0.feed_forward.w1.weight
but it will run into the problem with saving/loading as previously mentioned. I also tried using theTiedLinear
method, but it seems to mess up with FSDP. What is the appropriate way to do this?Thanks for the help!
The text was updated successfully, but these errors were encountered: