-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow exporting models from remote locations #352
Conversation
Very cool, thank you @PicoCentauri! I actually like how this works! The are few things however, which are still not totally clear for me.
|
This is something I discussed as well with @Luthaf and @frostedoyster. Storing already exported models (
I am not sure but I don't think so. It creates a tempory file. I know caching is everybodies darling but can be hard to implement like based on which hash to we create the cache: the URL, or something based on the model. I will look into this and will add caching in a future version.
Yes, we have to recreate the extensions. The extensions depend also on the version of the architecture. So we may have to keep every version of the architecture around. See also my first comment to your first point. |
Okay, I'm fine with saving checkpoint either with or without TorchScripted models, if we need this. However, saving the checkpoint to the disk and then loading it back to activate the model with extensions seems a bit counterintuitive... Maybe we can make the I also asked ChatGPT what he thinks, and I think there was a good idea of creating a |
c94be49
to
64b5e8b
Compare
64b5e8b
to
4994b7c
Compare
Hmm I mean what you want is currently possible via import torch
from metatrain.cli import export_model
from metatrain.utils.io import load_model
model = torch.jit.load(export_model(load_model("experimental.pet", 'https://XXX.com/mymodel.ckpt'))) I wouldn't wrap this whole functionality into one function called But we can provide something that provides the workflow for the Python API that we are planning to write. What do you think? |
Regarding the PR in general, I would add caching and the actual example for the python API once we have all ingredients together. |
So, if you have a In the current version of the code, the extensions will be loaded when loading the architecture ( I'm not sure what happen with model = load_model("experimental.pet", 'https://XXX.com/mymodel.ckpt')
calculator = MetatensorCalculator(model) If not, maybe we should clarify a bit how this feature interacts with checkpoint/exported models.
I'm not sure I see why we would need to store checkpoints to be able to re-create the extensions? Importing the architecture should be enough to load all extensions (ignoring for now all questions of API stability & versioning). |
Yes, I think this should indeed work!
For ase sure, but what if you want to run a full fledge command line experience? To have the correct extensions exported you need a checkpoint or the architecture name (of course ignoring for now all questions of API stability & versioning). While I am writing this I see that really a model if enough plus the architecture name to construct the expansion. We maybe should change the code to always write the extensions. Currently we are just writing the model with a new name metatrain/src/metatrain/cli/export.py Lines 76 to 86 in 161fd56
but probably we should do something like if not is_exported(model):
model = model.export()
extensions_path = "extensions/"
model.save(path, collect_extensions=extensions_path)
logger.info( f"Model exported to '{path}' and extensions to '{extensions_path}'" ) Does this make sense? |
Yes, this looks a lot cleaner! |
4994b7c
to
d2ef7fa
Compare
It is much cleaner but unfortunately once a model is exported and reloaded the
So I think we should try to expose our save function when we export, but I don't know if this is possible. |
60abba5
to
e8e3248
Compare
e8e3248
to
87a4393
Compare
e60fbae
to
38765b7
Compare
This PR needs metatensor/metatensor#761 to be merged and a metatensor torch release to be continued and finished. |
38765b7
to
fb4989a
Compare
Sorry for the random comment, but it is important to keep in mind that |
fb4989a
to
a97c869
Compare
Getting closer. Now at reexport I get an error that
|
a97c869
to
30060e5
Compare
Can't your script them without saving and reloading? |
30060e5
to
9109dfb
Compare
9109dfb
to
3103159
Compare
You can, but that's not what we're doing for now (see |
You should be able to do inner = ...
model = MetatensorAtomisticModel(inner, ...)
scripted = torch.jit.script(model) And then use You would loose the ability to save the model though, unless we refactor the code for this a bit (make it a freestanding function, and call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test with a checkpoint export from a remote location? Are you working on it?
|
||
.. code-block:: bash | ||
|
||
mtt export model.ckpt -o model.pt | ||
mtt export experimental.soap_bpnn model.ckpt -o model.pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely unrelated, but I will open an issue to have this removed (it should be easy and I don't like it too much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't like the syntax itself or the line in the docs?
The syntax we can't remove because we need the corresponding archtecture name to load a checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would ideally like to go back to mtt export model.ckpt -o model.pt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the name on the command line should be avoidable by requiring an architecture_name
field in the checkpoint (but then this rule must be enforced for all architectures and added to "how to add a new architecture")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we could do this. Good idea!
extras = # architectures used in the package tests | ||
soap-bpnn | ||
pet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PET actually used in the package tests? I can't find it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hidden a bit. I need the PET requirements to be installed to check for a wrong architecture name in this test.
I did not find another way to trigger the test.
metatrain/tests/utils/test_io.py
Lines 75 to 84 in e52050f
def test_load_model_unknown_model(): | |
architecture_name = "experimental.pet" | |
path = RESOURCES_PATH / "model-32-bit.ckpt" | |
match = ( | |
f"path '{path}' is not a valid model file for the {architecture_name} " | |
"architecture" | |
) | |
with pytest.raises(ValueError, match=match): | |
load_model(path, architecture_name=architecture_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh fair. My concern was that PET takes a while to install (thanks to its compiled extension). I will open an issue to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am also not happy, maybe one can also monkeypatch this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry, we should be able to take it out soon-ish
91eacfa
to
e52050f
Compare
Everything is already there in metatrain/tests/utils/test_io.py Lines 39 to 62 in e52050f
|
Sorry for my ignorance, what is |
No, it is fine. Yes, it is the same because we are using urllib to do the heavy lifting of downloading files with a common API. If there is an supported URL format that is recognized by urlparse we will use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, and also thanks for the explanation! It's all ready IMO
Fixes #343
To allow this, I added a new function
load_model
which either loads a model from disk or URL and returns the model. Syntax from the CLI is not changed and from Python one has to doIt also works for already exported models even without the
architecture_name
which makes models directly usable for MD for example inside the
MetatensorCalculator
for ASE.We can simplify the imports but let me know if you are happy with the API.
Contributor (creator of pull-request) checklist
📚 Documentation preview 📚: https://metatrain--352.org.readthedocs.build/en/352/