Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer Quality Target Change #171

Open
bitfort opened this issue Jan 17, 2019 · 7 comments
Open

Transformer Quality Target Change #171

bitfort opened this issue Jan 17, 2019 · 7 comments
Labels
Backlog An issue to be discussed in a future Working Group, but not the immediate next one.

Comments

@bitfort
Copy link

bitfort commented Jan 17, 2019

Note to follow up about the current transformer quality target (25->27?).

@bitfort bitfort added the Next Meeting Item to be discussed in the next Working Group label Jan 17, 2019
@bitfort
Copy link
Author

bitfort commented Jan 17, 2019

SWG Notes:

We intend to move to the quality target to 27. There is an AI to modify (and confirm) the reference reaches the target.

@bitfort bitfort added the AI There is an action item here. label Jan 24, 2019
@bitfort
Copy link
Author

bitfort commented Jan 24, 2019

SWG Notes:

AI(Cray) - Check target quality on english to french and english to german.
Related to:
mlcommons/policies#175

@bitfort
Copy link
Author

bitfort commented Jan 31, 2019

SWG Notes:

(English to german) Published accuracy is 28.4; not able to hit 27 at the reference batch size yet; continuing parameter searching here. We expect reference to hit 27, but with changes to learning rate / batch size.

(English to german) Google believes 27 can be hit at ~64k tokens global batch size. Above this, haven't been able to converge; but still exploring. Roughly doubles # of epochs versus 25.

(English to french) published accuracy is 43... Google has seen around 41, but on going investigation.

Continuing Cray AI.
AI(Google) Explore english to french at scale (non-reference).

@bitfort
Copy link
Author

bitfort commented Mar 14, 2019

SWG Notes:

We feel that variance is a concern here, especially at a target of 27. We'd like to increase accuracy, but want more information on variance to set the target.

AI(Cray & Google & CISCO) -- Do a some runs to 26 to look at variance (and provide data for 25.5 too).

@jbalma
Copy link

jbalma commented Mar 28, 2019

I was able to get 8x transformer reference runs in and saw convergence to 26.0 on Eng-to-Germ within 5 epochs for 5/8 runs, and within 6 epochs for remaining 3.

Here is the relevant grep from the logs:

grep "Bleu score (uncased)" mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_*new/translation/logfile | grep ": 26"
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Bleu score (uncased): 26.452380418777466
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Bleu score (uncased): 26.39443278312683
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Bleu score (uncased): 26.0280579328537
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Bleu score (uncased): 26.264476776123047
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Bleu score (uncased): 26.29130184650421
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Bleu score (uncased): 26.16676688194275
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Bleu score (uncased): 26.01703405380249
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Bleu score (uncased): 26.256629824638367

mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Starting iteration 5
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Starting iteration 6
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Starting iteration 6
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Starting iteration 5
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Starting iteration 5
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Starting iteration 5
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Starting iteration 5
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Starting iteration 6

@bitfort bitfort added Rec: Rules Change A recommendation has been issued by the Working Group. and removed AI There is an action item here. Next Meeting Item to be discussed in the next Working Group labels Apr 11, 2019
@bitfort
Copy link
Author

bitfort commented Apr 11, 2019

SWG Notes:

No change to target accuracy for v0.6. We think for v0.7 we can move to target quality of 27 given more time to work on the issue.

@petermattson petermattson added Backlog An issue to be discussed in a future Working Group, but not the immediate next one. and removed Rec: Rules Change A recommendation has been issued by the Working Group. labels May 29, 2020
@petermattson
Copy link
Contributor

Active, moving to backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog An issue to be discussed in a future Working Group, but not the immediate next one.
Projects
None yet
Development

No branches or pull requests

3 participants