Allow for specification of a train set size #212
Labels
Discussion
Issues to be discussed by the contributors
Infrastructure: Data
Related to data handling like readers and datasets
For demo purposes, I am trying to train a
soap-bpnn
on a small subset of qm7 (> 7000 structures) on my laptop.One can specify in
options.yaml
the proportionate size ofval_set
andtest_set
, but cannot do so for the training set. As far as I understand, the train set size is inferred as the remaining proportion. In my case, I can make training faster by settingtest_set: 0.999
for instance, but this of course makes post-training evaluation very slow.In my case, if I want to train and test on a smaller subset it would require me to manually construct a smaller
.xyz
to pass as the input file. This is of course trivial, but having a way to specify a training size could be more convenient. For instance, allow settingtrain_set
too, and allowtrain_set
+val_set
+test_set
< 1.Suppose I want to generate a learning curve, with randomly shuffled training and validation data of different sizes (i.e. different runs with different random seeds), but a fixed test set. Can I do this with the current setup? Is it possible to point to a different hold out
.xyz
file as the test set?The text was updated successfully, but these errors were encountered: