Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic continuation #363

Merged
merged 6 commits into from
Oct 25, 2024
Merged

Automatic continuation #363

merged 6 commits into from
Oct 25, 2024

Conversation

frostedoyster
Copy link
Collaborator

@frostedoyster frostedoyster commented Oct 17, 2024

Implements "automatic" continuation as described in #362.
Questions:

  • is the naming good?
  • is the code good?
  • how and where should this be documented?

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

📚 Documentation preview 📚: https://metatrain--363.org.readthedocs.build/en/363/

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea. If this moves further remember to explain this well in the docs.

We aren't writing the checkpoints of the current training run in the directory where the training is started and take this file as reference. Would be a bit easier compared to searching for the latest folder...

@@ -98,6 +98,29 @@ def _prepare_train_model_args(args: argparse.Namespace) -> None:
args.options = OmegaConf.merge(args.options, override_options)


def _process_continue_from(continue_from: str) -> Optional[str]:
# covers the case where `continue_from` is `auto`
if continue_from == "auto":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should inform the user that the training is continued if the outputs directory is found or maybe error or warn if continue="auto" and now directory is found.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! It's now there

@frostedoyster frostedoyster marked this pull request as ready for review October 24, 2024 12:31
if Path("outputs/").exists():
# take the latest day directory
dir = sorted(Path("outputs/").iterdir())[-1]
# take the latest second directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# take the latest second directory
# take the latest time directory

Copy link
Collaborator Author

@frostedoyster frostedoyster Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the first directory (the "day" directory) is technically "time", I'll make it clearer

@frostedoyster frostedoyster merged commit 2c0a748 into main Oct 25, 2024
12 checks passed
@frostedoyster frostedoyster deleted the auto-continue branch October 25, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants