Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation set is decoded multiple times for ngram level metrics #402

Open
AmitMY opened this issue Jan 16, 2023 · 0 comments
Open

Validation set is decoded multiple times for ngram level metrics #402

AmitMY opened this issue Jan 16, 2023 · 0 comments
Labels

Comments

@AmitMY
Copy link

AmitMY commented Jan 16, 2023

Bug description

When training a model with --valid-metrics chrf ce-mean-words bleu-detok it seems like the development set is translated fully twice, in order to calculate the chrf and bleu-detok separately.

How to reproduce

Train a model with --valid-metrics chrf ce-mean-words bleu-detok

Context

Log

[2023-01-16 15:58:54] [valid] First sentence's tokens as scored:
[2023-01-16 15:58:54] [valid] Decoding validation set with SentencePieceVocab for scoring
[2023-01-16 15:58:54] [valid]   Hyp: M 5 3 x 5 0 0 ......
[2023-01-16 15:58:54] [valid]   Ref: M 5 5 1 x 5 6 0 .......
[2023-01-16 16:24:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:27] [valid] Ep. 1 : Up. 300 : ce-mean-words : 2.38221 : new best
[2023-01-16 17:16:28] [valid] Ep. 1 : Up. 300 : bleu-detok : 0 : new best
[2023-01-16 17:22:04] [valid] Ep. 1 : Up. 600 : chrf : 15.978 : new best

*The reason there's chrf twice is a bug in browsermt, now fixed.

@AmitMY AmitMY added the bug label Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant