Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating vmap for en->many model #5

Open
santha96 opened this issue Apr 27, 2022 · 5 comments
Open

Generating vmap for en->many model #5

santha96 opened this issue Apr 27, 2022 · 5 comments

Comments

@santha96
Copy link

santha96 commented Apr 27, 2022

Hi,
vmap is useful to reduce inference time significantly. Able to generate vmap for many to one model and its works fine. How does vmap work for one to many models?

@guillaumekln
Copy link
Contributor

Hi,

It will work similarly, but the list of candidates for a given source sentence will include tokens/words from multiple languages.

@santha96
Copy link
Author

Hi,
Generated vmap using the below command.
python build-vmap.py -pt phrase-table -ms 3 -mf 2 -km 20 -tv target_vocabulary -zg zg_list > vmap

Enabling vmap in one to many directions in ctranslate2 leads to a bleu score drop of 2-3 points per language. Also when I looked inside generated vmap, the source tokens followed by the supervision language tag capture more meaning in the corresponding language due to the presence of tags but other source tokens which is far away from the language tag either capture meaning from a few languages or it seems to be insufficient coverage due to many languages. will increasing keep meaning(-km) parameter help or is there any better way to do it?.can you pls suggest it?

@guillaumekln
Copy link
Contributor

Indeed the current approach may not work well for one to many data. I can't think of a parameter that can fully resolve your issue. It looks like a solution would be to have one vmap per target language? The inference code could then select the appropriate vmap based on the language token.

@santha96
Copy link
Author

santha96 commented May 9, 2022

Thanks, @guillaumekln .do we have such support in ctranslate2?

@guillaumekln
Copy link
Contributor

No, this logic is not implemented. It is only an idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants