Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PET restart fails with options_restart.yaml #424

Open
HannaTuerk opened this issue Dec 12, 2024 · 0 comments
Open

PET restart fails with options_restart.yaml #424

HannaTuerk opened this issue Dec 12, 2024 · 0 comments
Labels
Bug Something isn't working PET PET experimental architecture Priority: Low Non-urgent issues to handle last.

Comments

@HannaTuerk
Copy link

Hi,

I tried to restart my PET training with a generated options_restart.yaml file mtt train options_restart.yaml -c last_checkpoint_model.ckpt, however it crashed with this error:
jsonschema.exceptions.ValidationError: {'USE_LORA_PEFT': False, 'LORA_RANK': 4, 'LORA_ALPHA': 0.5, 'INITIAL_LR': 5e-05, 'EPOCH_NUM_ATOMIC': 1000000000, 'EPOCHS_WARMUP_ATOMIC': 100000000, 'SCHEDULER_STEP_SIZE_ATOMIC': 500000000, 'GLOBAL_AUG': Tr ue, 'SLIDING_FACTOR': 0.7, 'ATOMIC_BATCH_SIZE': 50, 'BALANCED_DATA_LOADER': True, 'MAX_TIME': 90800, 'ENERGY_WEIGHT': 0.1, 'MULTI_GPU': True, 'RANDOM_SEED': 0, 'CUDA_DETERMINISTIC': False, 'MODEL_TO_START_WITH': None, 'ALL_SPECIES_PATH': None, 'SELF_CONTRIBUTIONS_PATH': None, 'SUPPORT_MISSING_VALUES': False, 'USE_WEIGHT_DECAY': False, 'WEIGHT_DECAY': 0.0, 'DO_GRADIENT_CLIPPING': False, 'GRADIENT_CLIPPING_MAX_NORM': None, 'USE_SHIFT_AGNOSTIC_LOSS': False, 'ENERGIES_LOSS': 'per_atom', 'CHECKPOINT_INTERVAL': 5, 'EPOCH_NUM': 1000000} should not be valid under {'required': ['EPOCH_NUM', 'EPOCH_NUM_ATOMIC']}

with the original options.yaml file it works mtt train options.yaml -c last_checkpoint_model.ckpt. @DavideTisi experienced similar things

A check which surplus parameters are set should be made or maybe the error message improved so it is clear that just one of the parameters 'EPOCH_NUM' or 'EPOCH_NUM_ATOMIC' are allowed to be set in the input file.

Ideally, metatrain would not print the redundant parameters in the options_restart.yaml file, so it's clear that they were not used during the training

@HannaTuerk HannaTuerk added Bug Something isn't working PET PET experimental architecture Priority: Low Non-urgent issues to handle last. labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working PET PET experimental architecture Priority: Low Non-urgent issues to handle last.
Projects
None yet
Development

No branches or pull requests

1 participant