-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer number of heads #172
Comments
SWG Notes: This reduces computation and Google has data to show this does not reduce quality. Also, this is commonly used in production. AI(Google) spread sheet showing difference in runs. |
We drafted a documents to analyze the computation requirements as well as convergence experiments with this change: https://docs.google.com/a/google.com/document/d/e/2PACX-1vR3qcsQSL6r4xvHQP9-R40Rq33qSF5yqm47esWRTbRPzremPYs6-ZNqpSypiyYXyRdE-D7VLayEUY_c/pub |
there seem to be permission access issues getting to the doc???
You need permission to access this published document.
You are signed in as *[email protected]
<[email protected]>*, but you don't have permission to access this
published document. You may need to sign in as a different user
<https://docs.google.com/document/d/e/2PACX-1vR3qcsQSL6r4xvHQP9-R40Rq33qSF5yqm47esWRTbRPzremPYs6-ZNqpSypiyYXyRdE-D7VLayEUY_c/logout>
.
…On Thu, Feb 21, 2019 at 9:53 AM Dehao Chen ***@***.***> wrote:
We drafted a documents to analyze the computation requirements as well as
convergence experiments with this change:
https://docs.google.com/a/google.com/document/d/e/2PACX-1vR3qcsQSL6r4xvHQP9-R40Rq33qSF5yqm47esWRTbRPzremPYs6-ZNqpSypiyYXyRdE-D7VLayEUY_c/pub
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<mlcommons/policies#172 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIUuT2xM6xE_VWs9hi0yFeU8_bYKHVLSks5vPt0VgaJpZM4aGi8A>
.
|
SWG: Seems tentatively OK but want to check with customers and will finalize next week. |
Currently number of heads in attention if 16, proposal is to move to 8 heads with the understanding that it achieves the same quality with better performance.
The text was updated successfully, but these errors were encountered: