You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not necessarily wrong, but I want to point out that using a ReLU here is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.
Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of self.embedding.weight and self.out). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in the self.out layer.
The text was updated successfully, but these errors were encountered:
This is not necessarily wrong, but I want to point out that using a ReLU here is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.
Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of
self.embedding.weight
andself.out
). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in theself.out
layer.The text was updated successfully, but these errors were encountered: