-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add translation funcs to utils #88
Add translation funcs to utils #88
Conversation
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you! Maintainer checklist
|
print(f"Translating batch {i//batch_size + 1}: {batch_words}") | ||
for lang_code in get_target_langcodes(source_language): | ||
tokenizer.src_lang = get_language_iso(source_language) | ||
encoded_words = tokenizer(batch_words, return_tensors="pt", padding=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it tokenizes the batch of words and converts them into a format suitable for the model. Due to this it becomes relatively faster to translate words.
@@ -10,18 +10,23 @@ | |||
get_language_from_iso, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing this PR with #89, it appears to me all changes here are already reflected in the other PR.
I am thinking of simply closing this one here and continuing on #89 directly - is that alright? Mostly to avoid making any changes here and then later having to redo/reconcile the same on the other PR.
CC @andrewtavis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah!, The other PR was checked from this this branch, merging that one makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Sounds good then - I'll do my review on that other PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both! 😊
Contributor checklist
Description
batch processing
for translation of words for abatch_size
of 100, this way it is significantly faster to translate.get_language_dir_path
,translation_interrupt_handler
,get_target_langcodes
andtranslate_to_other_languages
.Related issue