PET restart fails with options_restart.yaml #424
Labels
Bug
Something isn't working
PET
PET experimental architecture
Priority: Low
Non-urgent issues to handle last.
Hi,
I tried to restart my PET training with a generated options_restart.yaml file
mtt train options_restart.yaml -c last_checkpoint_model.ckpt,
however it crashed with this error:jsonschema.exceptions.ValidationError: {'USE_LORA_PEFT': False, 'LORA_RANK': 4, 'LORA_ALPHA': 0.5, 'INITIAL_LR': 5e-05, 'EPOCH_NUM_ATOMIC': 1000000000, 'EPOCHS_WARMUP_ATOMIC': 100000000, 'SCHEDULER_STEP_SIZE_ATOMIC': 500000000, 'GLOBAL_AUG': Tr ue, 'SLIDING_FACTOR': 0.7, 'ATOMIC_BATCH_SIZE': 50, 'BALANCED_DATA_LOADER': True, 'MAX_TIME': 90800, 'ENERGY_WEIGHT': 0.1, 'MULTI_GPU': True, 'RANDOM_SEED': 0, 'CUDA_DETERMINISTIC': False, 'MODEL_TO_START_WITH': None, 'ALL_SPECIES_PATH': None, 'SELF_CONTRIBUTIONS_PATH': None, 'SUPPORT_MISSING_VALUES': False, 'USE_WEIGHT_DECAY': False, 'WEIGHT_DECAY': 0.0, 'DO_GRADIENT_CLIPPING': False, 'GRADIENT_CLIPPING_MAX_NORM': None, 'USE_SHIFT_AGNOSTIC_LOSS': False, 'ENERGIES_LOSS': 'per_atom', 'CHECKPOINT_INTERVAL': 5, 'EPOCH_NUM': 1000000} should not be valid under {'required': ['EPOCH_NUM', 'EPOCH_NUM_ATOMIC']}
with the original options.yaml file it works
mtt train options.yaml -c last_checkpoint_model.ckpt
. @DavideTisi experienced similar thingsA check which surplus parameters are set should be made or maybe the error message improved so it is clear that just one of the parameters 'EPOCH_NUM' or 'EPOCH_NUM_ATOMIC' are allowed to be set in the input file.
Ideally, metatrain would not print the redundant parameters in the options_restart.yaml file, so it's clear that they were not used during the training
The text was updated successfully, but these errors were encountered: