You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the __init__ function of the MultiheadAttention class you use d_k and d_v to denote the dimensions of keys and values. You also define the projections below:
However, when d_v is not the same as d_q (it should be d_model // n_head), this will cause the shape of the queries to change after the attention operation and will cause problems in multiple layer structures.
After going through the official MultiheadAttention implementation of PyTorch, I believe that you used a similar presentation with:
However, in the official PyTorch implementation, it used weights, rather than a nn.Linear class, which means that the weights are actually used to transformer the dimension of keys from self.kdim to embed_dim, which is the very opposite to what your implementation is doing. So I believe that there might be some errors with your code. But overall, thank you for your work, it helped me a lot.
The text was updated successfully, but these errors were encountered:
In the
__init__
function of theMultiheadAttention
class you used_k
andd_v
to denote the dimensions of keys and values. You also define the projections below:However, when
d_v
is not the same asd_q
(it should bed_model // n_head
), this will cause the shape of the queries to change after the attention operation and will cause problems in multiple layer structures.After going through the official MultiheadAttention implementation of PyTorch, I believe that you used a similar presentation with:
However, in the official PyTorch implementation, it used weights, rather than a
nn.Linear
class, which means that the weights are actually used to transformer the dimension of keys fromself.kdim
toembed_dim
, which is the very opposite to what your implementation is doing. So I believe that there might be some errors with your code. But overall, thank you for your work, it helped me a lot.The text was updated successfully, but these errors were encountered: