From 9f32f97db41fd1fbbe27b89dfebfda4a8c7f343b Mon Sep 17 00:00:00 2001 From: itayhubara Date: Sun, 14 Apr 2024 15:51:51 +0300 Subject: [PATCH] update llama2_70b_lora evaluation skipping per WG decision --- training_rules.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/training_rules.adoc b/training_rules.adoc index 176de33..72a1fe3 100644 --- a/training_rules.adoc +++ b/training_rules.adoc @@ -462,7 +462,7 @@ CLOSED: The same quality measure as the reference implementation must be used. T |Language|Speech recognition |RNN-T|Every 1 epoch | |NLP |BERT| eval_interval_samples=FLOOR(0.05*(230.23*GBS+3000000), 25000), skipping 0 | |large Language Model |GPT3| Every 24576 sequences. CEIL(24576 / global_batch_size) if 24576 is not divisible by GBS -| |large Language Model |Llama2_70B_LoRA| Every 384 sequences, CEIL(384 / global_batch_size) steps if 384 is not divisible by GBS. skipping first 3 evaluations +| |large Language Model |Llama2_70B_LoRA| Every 384 sequences, CEIL(384 / global_batch_size) steps if 384 is not divisible by GBS. Skipping first FLOOR(0.125*global_batch_size+2) evaluations |Commerce|Recommendation |DLRMv2 (DCNv2)|Every FLOOR(TOTAL_TRAINING_SAMPLES / (GLOBAL_BATCH_SIZE * NUM_EVAL) samples, where TOTAL_TRAINING_SAMPLES = 4195197692 and NUM_EVAL = 20 |Graphs|Node classification|R-GAT|Evaluate 20 times per epoch |===