Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to finetune #195

Open
AlexCohenDambros opened this issue Mar 3, 2025 · 2 comments
Open

Trying to finetune #195

AlexCohenDambros opened this issue Mar 3, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@AlexCohenDambros
Copy link

I am also trying to do fine-tuning and encountered the same issues as mentioned in #174. I followed the steps outlined, replicating the code presented in #174, with the only change being the location where the data will be saved:

hf_train_ds.save_to_disk("test_database/store_ds")
hf_val_ds.save_to_disk("test_database/store_ds_eval")

I followed the same approach and created two YAML configuration files for training and validation. They are located respectively at cli/conf/finetune/data/data_fine_tuning.yaml and cli/conf/finetune/val_data/val_fine_tuning.yaml.

The training YAML file is:

_target_: uni2ts.data.builder.simple.SimpleDatasetBuilder
dataset: store_ds
storage_path: test_database
weight: 1000

And the validation YAML file is:

_target_: uni2ts.data.builder.ConcatDatasetBuilder
_args_:
  _target_: uni2ts.data.builder.simple.generate_eval_builders
  dataset: store_ds_eval
  storage_path: test_database
  offset: 95
  eval_length: 8
  prediction_lengths: [8]
  context_lengths: [16]
  patch_sizes: [16]

Then, I ran the training command as specified, changing only the file name:

python -m cli.train -cp conf/finetune run_name=fine_tuning_morai model=moirai_1.0_R_small data=data_fine_tuning val_data=val_fine_tuning

However, it generates the following error:

AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
...
assert time >= b > a >= 0
AssertionError

My questions:

  1. Is it possible to perform fine-tuning of the model without passing validation data?
  2. After performing the fine-tuning of the model, how do I load the model?

Thank you in advance for your help!

@AlexCohenDambros AlexCohenDambros added the bug Something isn't working label Mar 3, 2025
@zqiao11
Copy link
Contributor

zqiao11 commented Mar 5, 2025

Hi. This error is caused by EvalCrop, indicating that the context/prediction window cannot be cropped under the current configuration as it would exceed the data bounds. You can check the values of fcst_start, a and b to identify the problem.

Additionally, validation data is optional—simply omit val_data when running python -m cli.train.

After fine-tuning, you can load the model with moirai_lightning_ckpt. Set checkpoint_path as your finetuned model located in output directory. You can refer to this script in PR #189.

@AlexCohenDambros
Copy link
Author

Hi @zqiao11, firstly, thank you for your response.

I made the adjustment without providing validation data, using the following command:

CUDA_VISIBLE_DEVICES=0 python -m cli.train -cp conf/finetune run_name=fine_tuning_morai model=moirai_1.0_R_small data=data_fine_tuning val_data=val_fine_tuning

The training was executed with max_epochs set to 5.

As a result, I obtained an outputs folder, but it does not contain any checkpoint instance for the fine-tuned model. The directory in question is:

outputs/finetune/moirai_1.0_R_small/data_fine_tuning/fine_tuning_morai

Inside, it only contains a .hydra, a logs folder, and the train.log file.

To load the model I tried the following but it didn't work:

from uni2ts.model.moirai import MoiraiForecast
model = MoiraiForecast.load_from_checkpoint(
    checkpoint_path="outputs/finetune/moirai_1.1_R_small/data_fine_tuning/fine_tuning_morai",
    num_samples=100,
    patch_size=16,
    context_length=398
)

Finally, I also tried testing the code in PR #189 by executing the following command:

CUDA_VISIBLE_DEVICES=0 python -m cli.train -cp conf/finetune     exp_name=example_lsf     run_name=example_run     model=moirai_1.0_R_small     model.patch_size=32     model.context_length=1000     model.prediction_length=96     data=data_fine_tuning     data.patch_size=32     data.context_length=1000     data.prediction_length=96     data.mode=S

I also removed the validation data and changed the etth1 data to mine, but this resulted in the following error:

Error executing job with overrides: ['exp_name=example_lsf', 'run_name=example_run', 'model=moirai_1.0_R_small', 'model.patch_size=32', 'model.context_length=1000', 'model.prediction_length=96', 'data=data_fine_tuning', 'data.patch_size=32', 'data.context_length=1000', 'data.prediction_length=96', 'data.mode=S']
Error in call to target 'uni2ts.model.moirai.finetune.MoiraiFinetune':
TypeError("MoiraiFinetune.__init__() got an unexpected keyword argument 'patch_size'")
full_key: model

It is recommended to set the environment variable `HYDRA_FULL_ERROR=1` for a complete stack trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants