Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for empty validation and test sets #152

Merged
merged 7 commits into from
Mar 19, 2024
Merged

Account for empty validation and test sets #152

merged 7 commits into from
Mar 19, 2024

Conversation

frostedoyster
Copy link
Collaborator

@frostedoyster frostedoyster commented Mar 15, 2024

Closes #96


📚 Documentation preview 📚: https://metatensor-models--152.org.readthedocs.build/en/152/

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks also for the tests

src/metatensor/models/cli/eval.py Outdated Show resolved Hide resolved
src/metatensor/models/cli/train.py Outdated Show resolved Hide resolved
@frostedoyster
Copy link
Collaborator Author

Unfortunately no because the user can also specify their own validation set instead of using fractions

@PicoCentauri
Copy link
Contributor

I don't see how a dataset can be empty if successfully read something from disk?

@frostedoyster
Copy link
Collaborator Author

frostedoyster commented Mar 15, 2024

Hmm ok, then that's an issue because, in my opinion, we should allow empty validation sets, as long as they aren't all empty. Perhaps this should be allowed in a different PR though

@PicoCentauri
Copy link
Contributor

How do you think would the syntax for something like this look like?

@frostedoyster
Copy link
Collaborator Author

I don't yet know if and how we should allow it. Perhaps we should discuss it with the whole crew. For now, I will go with your suggestions

@@ -305,7 +305,7 @@ def _train_model_hydra(options: DictConfig) -> None:
validation_size = validation_options
train_size -= validation_size

if validation_size < 0 or validation_size >= 1:
if validation_size <= 0 or validation_size >= 1:
raise ValueError("Validation set split must be between 0 and 1.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Validation set split must be between 0 and 1.")
raise ValueError("Validation set split must be larger than 0 and smaller than 1.")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then also for the training set...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MERGE!!!!!

@frostedoyster frostedoyster merged commit d20df46 into main Mar 19, 2024
11 checks passed
@frostedoyster frostedoyster deleted the empty-test branch March 19, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Account for the fact that the test set might be empty
2 participants