-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformer: Missing dropout after the self-attention #245
Comments
This is clearly wrong. I wonder how we should deal with such cases. I have lots of experiments now with the old code, so changing the behavior is very inconvenient for me, to properly track the difference. On the other side, we said that changing the behavior is still ok now, and better now than later esp in a case like this here. Introducing a special option for this might simplify it a bit but is very ugly. |
Ok I temporarily introduced the flag |
Explicitly without use_dropout_after_self_att. rwth-i6/returnn_common#245
Explicitly without use_dropout_after_self_att. rwth-i6/returnn_common#245
I noticed that we do not have dropout after the self-attention:
This is different to the standard Transformer.
This is also different to the paper.
Originally posted by @albertz in #233 (comment)
The text was updated successfully, but these errors were encountered: