Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user-facing shortcuts for architecture parameters #387

Open
frostedoyster opened this issue Nov 12, 2024 · 2 comments
Open

Allow user-facing shortcuts for architecture parameters #387

frostedoyster opened this issue Nov 12, 2024 · 2 comments
Labels
Discussion Issues to be discussed by the contributors Infrastructure: CLI Related to the command line interface Priority: Low Non-urgent issues to handle last.

Comments

@frostedoyster
Copy link
Collaborator

As discussed in #385, it might be easier for users to have architecture option shortcuts. For example,

loss: "mae"

instead of

loss:
  type: "mae"
@frostedoyster frostedoyster added Priority: Low Non-urgent issues to handle last. Discussion Issues to be discussed by the contributors Infrastructure: CLI Related to the command line interface labels Nov 12, 2024
@Luthaf
Copy link
Member

Luthaf commented Nov 13, 2024

Copying the discussion from the PR (#385 (comment))

@PicoCentauri
I think we should also just allow

loss: mse

and exand this to

loss:
    type: mse
    weights: {}
    reduction: sum

This should help usability in is in line of what we do for the datasets.

@frostedoyster
I think this is confusing for the user, because then the hypers change dynamically. Since we give the user the full template of the hypers to rely on, I would keep the format exactly as in the default hypers template (at least for now)

@Luthaf
I agree with @PicoCentauri here, having a smaller version of the same input is nice, especially if people are not just copy-pasting our templates around. We are already doing this for datasets:

training_set: "dataset.xyz"

Actually means

training_set:
    systems:
        read_from: dataset.xyz
        reader: ase
        length_unit: null
#        ...

@frostedoyster
Copy-pasting the templates is the only way for users to modify the hypers... they can't make them up. This means that this feature, which is hard to implement, test and maintain, won't be used much

I'm saying that the overhead for this feature just isn't worth it, considering that the projected gain is from

loss:
  type: mae

to

loss: mae

@Luthaf
It's not a lot more, but it is more consistent with other parameters, and nicer to use when writing the input by hand.

I haven't checked the code complexity, but this also looks like a fairly easy thing to implement, so the implementation overhead should be very manageable.

@frostedoyster
For now, I think it opens too many possibilities and, from what I can see, it's not easy to test and maintain. Like in programming languages, having many ways of achieving the same thing is not always an advantage. I will open the issue though

@Luthaf
I don't see why the possibilities opened are too many, could you elaborate? Test & maintenance should also be trivial: if the type of loss is a string, you transform to loss: {"type": <value>} and then continue as is. Am I missing something here?

I could understand if we where trying to be very explicit everywhere, but In general, metatrain options.yaml already includes a lot of syntactic sugar, from default hypers and with the datatset handling; and I really don't see how the loss would be any different.

@frostedoyster
Copy link
Collaborator Author

If you allow shortcuts like these, it means that you have to transform the user-provided hypers to the final version at some point. Now, around the code, you have two versions (shortcut and final) and you never know which one you're supposed to use, which one you're supposed to get out when you debug, which one will be saved when you save the hypers for restarting a run, which one you're supposed to feed a test and so on.
I noticed this when working with the dataset section. I like the way we do it for the dataset, because there the user isn't necessarily going to copy-paste and is instead allowed to free-style a bit. For the architecture, however, only non-basic users will modify the hypers (and especially if we do something like #273). If they do, it will almost never be through free-styling, but almost always by copy-pasting the defaults anyway. Therefore, IMO this feature adds development cost without any user experience benefits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Issues to be discussed by the contributors Infrastructure: CLI Related to the command line interface Priority: Low Non-urgent issues to handle last.
Projects
None yet
Development

No branches or pull requests

2 participants