You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper says, "...the same weight matrix is shared between the two embedding layers..." referring to the encoder and decoder embedding layers respectively. However, in the lines below I can see that the encoder initializes its own embedding matrix, separate from the one in the decoder. Can you explain why this is so?
The paper says, "...the same weight matrix is shared between the two embedding layers..." referring to the encoder and decoder embedding layers respectively. However, in the lines below I can see that the encoder initializes its own embedding matrix, separate from the one in the decoder. Can you explain why this is so?
attention-is-all-you-need-pytorch/transformer/Models.py
Line 57 in 132907d
attention-is-all-you-need-pytorch/transformer/Models.py
Line 96 in 132907d
The text was updated successfully, but these errors were encountered: