-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Updrade onmt #6
base: main
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: irinaespejo.
|
Thanks a lot for working on rxn4chemistry, we strongly value contributions from our users. It appears that one of the commiters in the PR did not sign the CLA for contributors. To do so, you can open an issue to sign the CLA clicking here. More details can be found here. You can then request a new CLA check by commenting " |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Is it working or still WIP ? |
Functionally speaking the integration is complete, but we are leaving it as a WIP until results from both backward and forward models are compatible with the version using the older ONMT. |
Are differences of performances influence by the new tokenization ? Have you observed a 1% difference or a 0.1% difference, or larger ? Just to have an idea (order scaled) of the modification effects on performances. |
For forward models there is no difference, while for backward we observe a significant drop in round trip accuracy (50%). For the latter it might be due to some issues in running the inference appropriately as the loss and the token-level metrics are the same. We will post updates in the PR as soon as we have them. |
This PR is WIP to upgrade the dependency on
onmt
from a the forked version to latest v.3.5.1 dropping the need of a fork.And also upgrading to python 3.11
Changes made:
Preprocessing
Running
rxn-onmt-preprocess
throws an error about/bin/bash command onmt_preprocess not found
. This comes from this line. Theonmt_preprocess
functionality was dropped by OpenNMT from v.1.2.0 -> v.2.2.0Solution: changes can be found in
src/rxn/onmt_models/scripts/rxn_onmt_preprocess.py
by upgrading the command toonmt_build_vocab
here and a helper wrapper function here. The use ofonmt_build_vocab
is used all over the official docsTraining
The idea is to still call on cli
onmt_train -config /path/to/config.yaml
viarun_command()
but in a way such that it resembles as much as possible the official way here.Turns out we only need to: instead of passing
onmt_train -- <all arguments>
we dump the arguments thatrxn-onmt-train
receives via cli to aconfig.yaml
in the same way OpenNMT v.3.5.1 expects them.Solution: changes can be found in
src/rxn/onmt_models/scripts/rxn_onmt_train.py
added a wrapper function here because OpenNMT v.3.5.1 expectssrc_vocab
andtgt_vocab
in config file. See PR#7inrxn-onmt-utils
for changes in classOnmtTrainCommand