Allow for specification of a train set size #212

jwa7 · 2024-05-28T15:30:09Z

For demo purposes, I am trying to train a soap-bpnn on a small subset of qm7 (> 7000 structures) on my laptop.

One can specify in options.yaml the proportionate size of val_set and test_set, but cannot do so for the training set. As far as I understand, the train set size is inferred as the remaining proportion. In my case, I can make training faster by setting test_set: 0.999 for instance, but this of course makes post-training evaluation very slow.

In my case, if I want to train and test on a smaller subset it would require me to manually construct a smaller .xyz to pass as the input file. This is of course trivial, but having a way to specify a training size could be more convenient. For instance, allow setting train_set too, and allow train_set + val_set + test_set < 1.

Suppose I want to generate a learning curve, with randomly shuffled training and validation data of different sizes (i.e. different runs with different random seeds), but a fixed test set. Can I do this with the current setup? Is it possible to point to a different hold out .xyz file as the test set?

The text was updated successfully, but these errors were encountered:

jwa7 added the SOAP BPNN SOAP BPNN experimental architecture label May 28, 2024

frostedoyster added infrastructure and removed SOAP BPNN SOAP BPNN experimental architecture labels May 28, 2024

PicoCentauri added Discussion Issues to be discussed by the contributors Infrastructure: Data Related to data handling like readers and datasets and removed infrastructure labels Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for specification of a train set size #212

Allow for specification of a train set size #212

jwa7 commented May 28, 2024

Allow for specification of a train set size #212

Allow for specification of a train set size #212

Comments

jwa7 commented May 28, 2024