Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change atomic_types and gradients from Sets to Unique Lists #296

Merged
merged 3 commits into from
Jul 16, 2024

Conversation

PicoCentauri
Copy link
Contributor

@PicoCentauri PicoCentauri commented Jul 13, 2024

This pull request changes the types of the atomic_types and gradients properties in the DatasetInfo and TargetInfo classes from sets to (unique) lists.

These properties were originally designed as sets because they represent collections of unique items. However, @frostedoyster raised constant concerns about the difficulty from him working with sets in certain contexts, particularly with functions and classes that expect lists. This change aims to resolve those issues.

The atomic_types and gradients are still stored internally as sets to maintain uniqueness, but they are exposed as sorted lists for compatibility. This change has introduced additional methods and complexity to the classes which can be easily seen by number of added lines in this PR. For instance, the code had to be adjusted in various places, such as changing:

new_atomic_types = merged_info.atomic_types - self.dataset_info.atomic_types

to a much more nested style

new_atomic_types = sorted(
    set(merged_info.atomic_types) - set(self.dataset_info.atomic_types)
)

I must express that I am very unhappy with these changes. I think changing the types is very bad design choice. The properties atomic_types and gradients should clearly be sets due to their nature of representing unique items. Lists do not have these properties. This change, in my opinion, compromises the clarity and integrity of the code. However, I have made these changes to accommodate the concerns raised and to facilitate smoother development.

@frostedoyster, please review these changes to ensure that they fix your issues in #286. I expect that you (1) add a clear and not messy test in this PR to ensure that your log files are rendered in exactly the order you want them. (2) I encourage you to also to check and add a test for the continuation of your SOAP BPNN model when changing types. Scrolling through your test in test_continue.py, I saw no tests checking this functionality...

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

📚 Documentation preview 📚: https://metatrain--296.org.readthedocs.build/en/296/

Copy link

Sorry to intervene in what looks like a discussion that has been ongoing for a while, but what is the issue in wrapping a list() before passing these to a function that expects a list? Also, what are the implications in terms of overhead, both of storing elements as sets and of wrapping sets into a sorted() by default?

@PicoCentauri
Copy link
Contributor Author

Good question. I would say it should work. At least this is what we do for each model. They can just store their atomic types and gradients as list converted from the sets.

@frostedoyster frostedoyster linked an issue Jul 16, 2024 that may be closed by this pull request
@frostedoyster
Copy link
Collaborator

Thanks @PicoCentauri. I tested by hand the fact that it works. However, I couldn't find a way to write a test for the order in the log, because we don't have a reliable dataset with stresses in our test suite. I will open an issue for that

@frostedoyster frostedoyster merged commit 576b2b0 into main Jul 16, 2024
14 checks passed
@frostedoyster frostedoyster deleted the dataclass-lists branch July 16, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple gradient targets are printed in inconsistent order
3 participants