Embeddings through ReLU an unconventional decision #2

mallamanis · 2019-02-17T17:28:48Z

This is not necessarily wrong, but I want to point out that using a ReLU here is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.

Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of self.embedding.weight and self.out). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in the self.out layer.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings through ReLU an unconventional decision #2

Embeddings through ReLU an unconventional decision #2

mallamanis commented Feb 17, 2019

Embeddings through ReLU an unconventional decision #2

Embeddings through ReLU an unconventional decision #2

Comments

mallamanis commented Feb 17, 2019