Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom TypeHandler and "No per-process OCDBT checkpoint subdirs" warning #1326

Open
PhilipVinc opened this issue Nov 13, 2024 · 4 comments
Open
Labels
checkpoint type:bug Something isn't working

Comments

@PhilipVinc
Copy link

Hello,

I've recently created a custom type handler.
Using it, and running on a single process I see the following warning

WARNING:absl:[process=0][thread=async_save_18] Skipping merge of OCDBT checkpoints: No per-process OCDBT checkpoint subdirs found in /tmp/ckp3/115.orbax-checkpoint-tmp-136/callbacks.orbax-checkpoint-tmp-139, 

The custom type handler I wrote serialises some custom type containing some numpy arrays, and if I was to run this across multiple processes I'd like only the master process to serialise the data (which is basically replicated).

How can I silence this warning? Did I forgot to define something?

@cpgaffney1
Copy link
Collaborator

The merge is there to allow ArrayHandler to write data to per-process subdirectories, at which point they can be merged to form a "global view" that is used for restoration. In your custom handler the master process is responsible for serializing everything, so you already have a global view.

You could silence the warning by using your own PyTreeCheckpointHandler that just skips the finalize implementation.

Or your custom TypeHandler could write data to ocdbt.process_X on the master process and the merge would be performed on that single subdirectory, so the merge is basically a no-op since there's only one process.

@PhilipVinc
Copy link
Author

Thank you @cpgaffney1 .

I think I figured what the problem was...

If I use a PytreeSave which contains only types that are handled by a 'custom Type handler' (that do not create an ocdbt.process_X folder) then this warning gets thrown.

This is because PyTreeSave assumes that at least 1 ocdbt-directory-creation type handler is used to treat the collection, but this is not guaranteed..

@selamw1 selamw1 added type:bug Something isn't working checkpoint labels Feb 11, 2025
@garymm
Copy link
Contributor

garymm commented Feb 13, 2025

I'm seeing this warning without using custom TypeHandler, just using PyTreeSave. Is that expected?

@orbax-dev
Copy link
Collaborator

You must have use_ocdbt set to False, is that the case? Ideally we should not be logging a warning in that case since there's nothing to be concerned about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpoint type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants