Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The filepath provided must end in .keras (Keras model format) #22

Open
cespos opened this issue Sep 16, 2024 · 1 comment
Open

Comments

@cespos
Copy link

cespos commented Sep 16, 2024

Hi!

I have been trying to use AiZynthTrain to train AiZynthFinder with some personal reactions and reaction template.
I have mapped and cleaned the reaction and template files with my own protocols and my goal is to retrain AiZynthFinder without running any additional cleaning/preparation step.

I have used the expansion pipeline with the following config file:

expansion_model_pipeline:
  python_kernel: aizynthtrain
  file_prefix: test
  nbatches: 200
  training_fraction: 0.9
  random_seed: 1689
  selected_ids_path: "lookup_templates.json"

And I got the following errors during training:

2024-09-16 13:44:07.814 [1726483142464316/model_training/206 (pid 3123416)] Task is starting.
2024-09-16 13:44:08.591 [1726483142464316/model_training/206 (pid 3123416)] 2024-09-16 13:44:08.591729: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-16 13:44:08.604 [1726483142464316/model_training/206 (pid 3123416)] 2024-09-16 13:44:08.604123: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-16 13:44:08.607 [1726483142464316/model_training/206 (pid 3123416)] 2024-09-16 13:44:08.607848: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-16 13:44:13.767 [1726483142464316/model_training/206 (pid 3123416)] <flow ExpansionModelFlow step model_training> failed:
2024-09-16 13:44:13.873 [1726483142464316/model_training/206 (pid 3123416)] Internal error
2024-09-16 13:44:13.875 [1726483142464316/model_training/206 (pid 3123416)] Traceback (most recent call last):
2024-09-16 13:44:13.875 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 1134, in main
2024-09-16 13:44:13.875 [1726483142464316/model_training/206 (pid 3123416)] start(auto_envvar_prefix="METAFLOW", obj=state)
2024-09-16 13:44:13.875 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/tracing/__init__.py", line 27, in wrapper_func
2024-09-16 13:44:13.875 [1726483142464316/model_training/206 (pid 3123416)] return func(args, kwargs)
2024-09-16 13:44:14.668 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 829, in __call__
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] return self.main(args, kwargs)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 782, in main
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] rv = self.invoke(ctx)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke

2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] return ctx.invoke(self.callback, ctx.params)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] return callback(args, kwargs)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] return f(get_current_context(), args, kwargs)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/cli.py", line 468, in step
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] task.run_step(
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 650, in run_step
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] self._exec_step_function(step_func)
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/metaflow/task.py", line 62, in _exec_step_function
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] step_function()
2024-09-16 13:44:14.669 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/aizynthtrain/pipelines/expansion_model_pipeline.py", line 83, in model_training
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] training_runner([self.config_path])
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/aizynthtrain/modelling/expansion_policy/training.py", line 83, in main
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] callbacks = setup_callbacks(
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/aizynthtrain/utils/keras_utils.py", line 76, in setup_callbacks
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] checkpoint = ModelCheckpoint(
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] File "/data/users/carespos/conda/envs/aizynthtrain/lib/python3.10/site-packages/keras/src/callbacks/model_checkpoint.py", line 191, in __init__
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] raise ValueError(
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] ValueError: The filepath provided must end in `.keras` (Keras model format). Received: filepath=test_keras_model.hdf5
2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)]
2024-09-16 13:44:14.674 [1726483142464316/model_training/206 (pid 3123416)] Task failed.
2024-09-16 13:44:14.679 Workflow failed.
2024-09-16 13:44:14.679 Terminating 0 active tasks...
2024-09-16 13:44:14.679 Flushing logs...
    Step failure:
    Step model_training (task-id 206) failed.

where the final error is:

2024-09-16 13:44:14.670 [1726483142464316/model_training/206 (pid 3123416)] ValueError: The filepath provided must end in `.keras` (Keras model format). Received: filepath=test_keras_model.hdf5
  • Could you please let me know how to fix/debug this?

  • And is this the right pipeline to use when wanting to train AiZynthFinder without running any preparation step?

Many thanks!

Carmen

@cespos
Copy link
Author

cespos commented Sep 18, 2024

I fixed it by installing specific keras and tensorflow versions:

pip install keras==2.8.0
pip install tensorflow==2.8.0
pip install tensorboard==2.8.0
pip install tensorflow-serving-api==2.8.0

To avoid this issue to occur in the future, the dependencies could be added to the pyproject.toml.

However, I got now another error during validation:

FileNotFoundError: [Errno 2] No such file or directory: 'testing_template_library.csv'

Even if I did not configure the validation pipeline, it seems it's still running it.

Best,
Carmen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant