Allow exporting models from remote locations #352

PicoCentauri · 2024-10-03T15:56:31Z

Fixes #343

To allow this, I added a new function load_model which either loads a model from disk or URL and returns the model. Syntax from the CLI is not changed and from Python one has to do

from metatrain.utils.io import load_model

model = load_model(
    path="https://my.url.com/fancy_model.ckpt",
    architecture_name=""experimental.soap_bpnn",
)
model.export()

It also works for already exported models even without the architecture_name

model = load_model("https://my.url.com/fancy_model.pt")

which makes models directly usable for MD for example inside the MetatensorCalculator for ASE.

We can simplify the imports but let me know if you are happy with the API.

Contributor (creator of pull-request) checklist

Tests updated (for new features and bugfixes)?
Documentation updated (for new features)?
Issue referenced (for PRs that solve an issue)?

📚 Documentation preview 📚: https://metatrain--352.org.readthedocs.build/en/352/

abmazitov · 2024-10-03T16:06:53Z

Very cool, thank you @PicoCentauri! I actually like how this works! The are few things however, which are still not totally clear for me.

What are we planning to store remotely in the end? Is it a checkpoint (.ckpt file) or an exported model (.pt file). I would personally vote for the latter, but in this case I'm not sure if the extensions can be exported properly.
Does urllib.urlretrieve use caching?
AFAIK, MetatensorCalculator uses both the model and the extensions directory as input arguments. Does it mean that we have to first export the model to store the extensions on disk, and then load both the model and the extensions back to the MetatensorCalculator? Maybe @Luthaf could say more on this?

PicoCentauri · 2024-10-04T08:04:33Z

What are we planning to store remotely in the end? Is it a checkpoint (.ckpt file) or an exported model (.pt file). I would personally vote for the latter, but in this case I'm not sure if the extensions can be exported properly.

This is something I discussed as well with @Luthaf and @frostedoyster. Storing already exported models (.pt) is nice and preferred for standalone models. But, if your architecture uses extensions we have to rebuild these extensions for the platform you want to run the downloaded model. That is why we should maybe store the final checkpoints for these. To have a smoother user experience models should keep a version (See also #351) to avoid confusing errors when trying to create the extensions on export.

Does urllib.urlretrieve use caching?

I am not sure but I don't think so. It creates a tempory file. I know caching is everybodies darling but can be hard to implement like based on which hash to we create the cache: the URL, or something based on the model. I will look into this and will add caching in a future version.

AFAIK, MetatensorCalculator uses both the model and the extensions directory as input arguments. Does it mean that we have to first export the model to store the extensions on disk, and then load both the model and the extensions back to the MetatensorCalculator? Maybe @Luthaf could say more on this?

Yes, we have to recreate the extensions. The extensions depend also on the version of the architecture. So we may have to keep every version of the architecture around. See also my first comment to your first point.

abmazitov · 2024-10-04T13:16:42Z

Okay, I'm fine with saving checkpoint either with or without TorchScripted models, if we need this. However, saving the checkpoint to the disk and then loading it back to activate the model with extensions seems a bit counterintuitive... Maybe we can make the load_model function store the extensions automatically and return an instance of the MetatensorAtomisticModel? I.e. the load_model can actually get the checkpoint from URL, export the model with the extensions to a specific place on disk (see also the comment below), and then return the loaded MetatensorAtomisticModel already with pre-loaded extensions.

I also asked ChatGPT what he thinks, and I think there was a good idea of creating a ~/.metatensor/ directory to store the cached checkpoints and models. In this case, for every version of the model, we can actually create a folder with the checkpoint, the exported model itself, and its extensions, and access it later. If we come up with a proper naming convention so every model and every version has it's own unique directory, we can solve the caching problem as well (and avoid exporting every time the load_model is called).

PicoCentauri · 2024-10-07T13:42:09Z

Hmm I mean what you want is currently possible via

import torch

from metatrain.cli import export_model
from metatrain.utils.io import load_model

model = torch.jit.load(export_model(load_model("experimental.pet", 'https://XXX.com/mymodel.ckpt')))

I wouldn't wrap this whole functionality into one function called load_model. Each part is used by different parts of metatrain.

But we can provide something that provides the workflow for the Python API that we are planning to write. What do you think?

PicoCentauri · 2024-10-07T13:43:10Z

Regarding the PR in general, I would add caching and the actual example for the python API once we have all ingredients together.

Luthaf · 2024-10-07T14:22:12Z

AFAIK, MetatensorCalculator uses both the model and the extensions directory as input arguments. Does it mean that we have to first export the model to store the extensions on disk, and then load both the model and the extensions back to the MetatensorCalculator? Maybe @Luthaf could say more on this?

So, if you have a MetatensorAtomisticModel instance, you already have all extensions loaded, and you can create a calculator straight away. The extensions argument is only useful when trying to load the model from a path.

In the current version of the code, the extensions will be loaded when loading the architecture (architecture = import_architecture(architecture_name)).

I'm not sure what happen with architecture.__model__.load_checkpoint() though, is it required to return a MetatensorAtomisticModel? If so, this should work:

model = load_model("experimental.pet", 'https://XXX.com/mymodel.ckpt')
calculator = MetatensorCalculator(model)

If not, maybe we should clarify a bit how this feature interacts with checkpoint/exported models.

But, if your architecture uses extensions we have to rebuild these extensions for the platform you want to run the downloaded model. That is why we should maybe store the final checkpoints for these

I'm not sure I see why we would need to store checkpoints to be able to re-create the extensions? Importing the architecture should be enough to load all extensions (ignoring for now all questions of API stability & versioning).

PicoCentauri · 2024-10-07T15:29:04Z

model = load_model("experimental.pet", 'https://XXX.com/mymodel.ckpt')
calculator = MetatensorCalculator(model)

Yes, I think this should indeed work!

I'm not sure I see why we would need to store checkpoints to be able to re-create the extensions? Importing the architecture should be enough to load all extensions (ignoring for now all questions of API stability & versioning).

For ase sure, but what if you want to run a full fledge command line experience? To have the correct extensions exported you need a checkpoint or the architecture name (of course ignoring for now all questions of API stability & versioning).

While I am writing this I see that really a model if enough plus the architecture name to construct the expansion. We maybe should change the code to always write the extensions. Currently we are just writing the model with a new name

metatrain/src/metatrain/cli/export.py

Lines 76 to 86 in 161fd56

    
           if is_exported(model): 
        
               logger.info(f"The model is already exported. Saving it to `{path}`.") 
        
               torch.jit.save(model, path) 
        
           else: 
        
               extensions_path = "extensions/" 
        
               logger.info( 
        
                   f"Exporting model to '{path}' and extensions to '{extensions_path}'" 
        
               ) 
        
               mts_atomistic_model = model.export() 
        
               mts_atomistic_model.save(path, collect_extensions=extensions_path) 
        
               logger.info("Model exported successfully")

but probably we should do something like

if not is_exported(model): 
    model = model.export()

extensions_path = "extensions/" 
model.save(path, collect_extensions=extensions_path) 
logger.info( f"Model exported to '{path}' and extensions to '{extensions_path}'" )

Does this make sense?

Luthaf · 2024-10-08T16:48:06Z

but probably we should do something like [...]

Yes, this looks a lot cleaner!

PicoCentauri · 2024-10-14T16:30:18Z

It is much cleaner but unfortunately once a model is exported and reloaded the save method comes from torch and not from metansor. You get an error like this when you try to reexport.

In [7]: model.save("foo.pt", collect_extensions="extensions/")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 model.save("foo.pt", collect_extensions="extensions/")

File ~/repos/lab-cosmo/metatrain/.venv/lib/python3.12/site-packages/torch/jit/_script.py:753, in RecursiveScriptModule.save(self, f, **kwargs)
    744 def save(self, f, **kwargs):
    745     r"""Save with a file-like object.
    746 
    747     save(f, _extra_files={})
   (...)
    751     DO NOT confuse these two functions when it comes to the 'f' parameter functionality.
    752     """
--> 753     return self._c.save(str(f), **kwargs)

TypeError: save(): incompatible function arguments. The following argument types are supported:
    1. (self: torch._C.ScriptModule, filename: str, _extra_files: dict[str, str] = {}) -> None

Invoked with: <torch.ScriptObject object at 0x12f855d20>, 'foo.pt'; kwargs: collect_extensions='extensions/'

So I think we should try to expose our save function when we export, but I don't know if this is possible.

PicoCentauri · 2024-10-24T16:21:35Z

This PR needs metatensor/metatensor#761 to be merged and a metatensor torch release to be continued and finished.

frostedoyster · 2024-10-28T19:53:46Z

Sorry for the random comment, but it is important to keep in mind that MetatensorAtomisticModels are not torchscripted unless they're saved to a file and re-loaded. This might be important for optimal speed

PicoCentauri · 2024-11-04T16:05:40Z

Getting closer. Now at reexport I get an error that module.forward.__annotations__ is missing for an already exported model.

module = RecursiveScriptModule(
  original_name=SoapBpnn
  (soap_calculator): RecursiveScriptModule(original_name=SoapPowerSpec...ecursiveScriptModule(
    original_name=ModuleList
    (0): RecursiveScriptModule(original_name=CompositionModel)
  )
)

    def _check_annotation(module: torch.nn.Module):
        # check annotations on forward
>       annotations = module.forward.__annotations__

PicoCentauri · 2024-11-05T11:20:53Z

Sorry for the random comment, but it is important to keep in mind that MetatensorAtomisticModels are not torchscripted unless they're saved to a file and re-loaded. This might be important for optimal speed

Can't your script them without saving and reloading?

frostedoyster · 2024-11-06T07:31:27Z

You can, but that's not what we're doing for now (see MetatensorAtomisticModel class in metatensor)

Luthaf · 2024-11-06T10:10:27Z

You should be able to do

inner = ...
model = MetatensorAtomisticModel(inner, ...)

scripted = torch.jit.script(model)

And then use scripted, without loading/unloading.

You would loose the ability to save the model though, unless we refactor the code for this a bit (make it a freestanding function, and call save_atomistic_model(self) in MetatensorAtomisticModel.save)

frostedoyster

Do we have a test with a checkpoint export from a remote location? Are you working on it?

frostedoyster · 2024-11-08T09:00:29Z

docs/src/getting-started/checkpoints.rst


 .. code-block:: bash

-    mtt export model.ckpt -o model.pt
+    mtt export experimental.soap_bpnn model.ckpt -o model.pt


Completely unrelated, but I will open an issue to have this removed (it should be easy and I don't like it too much)

You don't like the syntax itself or the line in the docs?

The syntax we can't remove because we need the corresponding archtecture name to load a checkpoint.

Yes, I would ideally like to go back to mtt export model.ckpt -o model.pt

Having the name on the command line should be avoidable by requiring an architecture_name field in the checkpoint (but then this rule must be enforced for all architectures and added to "how to add a new architecture")

Yes we could do this. Good idea!

frostedoyster · 2024-11-08T09:09:18Z

tox.ini

+extras =  # architectures used in the package tests
+    soap-bpnn
+    pet


Is PET actually used in the package tests? I can't find it

It is hidden a bit. I need the PET requirements to be installed to check for a wrong architecture name in this test.
I did not find another way to trigger the test.

metatrain/tests/utils/test_io.py

Lines 75 to 84 in e52050f

def test_load_model_unknown_model():

architecture_name = "experimental.pet"

path = RESOURCES_PATH / "model-32-bit.ckpt"

match = (

f"path '{path}' is not a valid model file for the {architecture_name} "

"architecture"

)

with pytest.raises(ValueError, match=match):

load_model(path, architecture_name=architecture_name)

Ahh fair. My concern was that PET takes a while to install (thanks to its compiled extension). I will open an issue to track this

Yeah, I am also not happy, maybe one can also monkeypatch this...

Don't worry, we should be able to take it out soon-ish

PicoCentauri · 2024-11-08T09:13:14Z

Do we have a test with a checkpoint export from a remote location? Are you working on it?

Everything is already there in test_load_model_checkpoint and test_load_model_exported. Updated the docstring of load_model to make it clearer that we can load checkpoints and exported models.

metatrain/tests/utils/test_io.py

Lines 39 to 62 in e52050f

    
           @pytest.mark.parametrize( 
        
               "path", 
        
               [ 
        
                   RESOURCES_PATH / "model-32-bit.ckpt", 
        
                   str(RESOURCES_PATH / "model-32-bit.ckpt"), 
        
                   f"file:{str(RESOURCES_PATH / 'model-32-bit.ckpt')}", 
        
               ], 
        
           ) 
        
           def test_load_model_checkpoint(path): 
        
               model = load_model(path, architecture_name="experimental.soap_bpnn") 
        
               assert type(model) is SoapBpnn 
        
           @pytest.mark.parametrize( 
        
               "path", 
        
               [ 
        
                   RESOURCES_PATH / "model-32-bit.pt", 
        
                   str(RESOURCES_PATH / "model-32-bit.pt"), 
        
                   f"file:{str(RESOURCES_PATH / 'model-32-bit.pt')}", 
        
               ], 
        
           ) 
        
           def test_load_model_exported(path): 
        
               model = load_model(path, architecture_name="experimental.soap_bpnn") 
        
               assert type(model) is MetatensorAtomisticModel

frostedoyster · 2024-11-08T09:33:46Z

Sorry for my ignorance, what is f"file:{str(RESOURCES_PATH / 'model-32-bit.ckpt')}"?
I was looking for an https:// test, but perhaps it's the same

PicoCentauri · 2024-11-08T09:43:12Z

Sorry for my ignorance, what is f"file:{str(RESOURCES_PATH / 'model-32-bit.ckpt')}"?
I was looking for an https:// test, but perhaps it's the same

No, it is fine. Yes, it is the same because we are using urllib to do the heavy lifting of downloading files with a common API. If there is an supported URL format that is recognized by urlparse we will use urlretrieve that will "download" the file to a temporary folder and returns the path. urlparse recognized prefixed like https://, ftp:// but also file:. So using file: in the test will trigger the "url" branch of the code and it should work also with real remote locations like https://. If it doesn't this is a problem of urllib.

frostedoyster

Amazing, and also thanks for the explanation! It's all ready IMO

PicoCentauri force-pushed the download-model branch 2 times, most recently from c94be49 to 64b5e8b Compare October 7, 2024 13:31

PicoCentauri marked this pull request as ready for review October 7, 2024 13:31

PicoCentauri force-pushed the download-model branch from 64b5e8b to 4994b7c Compare October 7, 2024 13:32

PicoCentauri force-pushed the download-model branch from 4994b7c to d2ef7fa Compare October 14, 2024 16:27

PicoCentauri force-pushed the download-model branch 2 times, most recently from 60abba5 to e8e3248 Compare October 24, 2024 10:38

PicoCentauri mentioned this pull request Oct 24, 2024

Add is_atomistic_model function metatensor/metatensor#766

Merged

4 tasks

PicoCentauri force-pushed the download-model branch from e8e3248 to 87a4393 Compare October 24, 2024 14:03

PicoCentauri requested review from DavideTisi, abmazitov and frostedoyster as code owners October 24, 2024 16:14

PicoCentauri force-pushed the download-model branch 3 times, most recently from e60fbae to 38765b7 Compare October 24, 2024 16:17

PicoCentauri force-pushed the download-model branch from 38765b7 to fb4989a Compare October 24, 2024 16:23

PicoCentauri added 2 commits November 4, 2024 15:58

allow exporting models on remote locations

263c4ff

remove custom export function

3d73414

PicoCentauri force-pushed the download-model branch from fb4989a to a97c869 Compare November 4, 2024 16:04

PicoCentauri force-pushed the download-model branch from a97c869 to 30060e5 Compare November 5, 2024 08:33

PicoCentauri force-pushed the download-model branch from 30060e5 to 9109dfb Compare November 5, 2024 12:27

update

3103159

PicoCentauri force-pushed the download-model branch from 9109dfb to 3103159 Compare November 5, 2024 13:34

PicoCentauri mentioned this pull request Nov 8, 2024

Update to metatensor-torch 0.6.1 #382

Closed

PicoCentauri added 4 commits November 8, 2024 09:27

bump to mts torch 0.6.1

dd8eba8

Merge branch 'main' into download-model

91fb617

fix typos in docstrings

1e356d1

remove test_export

2bc80f6

frostedoyster reviewed Nov 8, 2024

View reviewed changes

smooth docstring

e52050f

PicoCentauri force-pushed the download-model branch from 91eacfa to e52050f Compare November 8, 2024 09:11

frostedoyster approved these changes Nov 8, 2024

View reviewed changes

frostedoyster mentioned this pull request Nov 8, 2024

Remove architecture from mtt export #383

Closed

PicoCentauri merged commit 5eefa32 into main Nov 8, 2024
12 checks passed

PicoCentauri deleted the download-model branch November 8, 2024 14:08

frostedoyster mentioned this pull request Nov 8, 2024

Remove PET from package tests #384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow exporting models from remote locations #352

Allow exporting models from remote locations #352

PicoCentauri commented Oct 3, 2024 •

edited

Loading

abmazitov commented Oct 3, 2024

PicoCentauri commented Oct 4, 2024

abmazitov commented Oct 4, 2024 •

edited

Loading

PicoCentauri commented Oct 7, 2024

PicoCentauri commented Oct 7, 2024

Luthaf commented Oct 7, 2024

PicoCentauri commented Oct 7, 2024

Luthaf commented Oct 8, 2024

PicoCentauri commented Oct 14, 2024

PicoCentauri commented Oct 24, 2024

frostedoyster commented Oct 28, 2024

PicoCentauri commented Nov 4, 2024

PicoCentauri commented Nov 5, 2024

frostedoyster commented Nov 6, 2024

Luthaf commented Nov 6, 2024

frostedoyster left a comment

frostedoyster Nov 8, 2024

PicoCentauri Nov 8, 2024

frostedoyster Nov 8, 2024

frostedoyster Nov 8, 2024

PicoCentauri Nov 8, 2024

frostedoyster Nov 8, 2024

PicoCentauri Nov 8, 2024

frostedoyster Nov 8, 2024

PicoCentauri Nov 8, 2024

frostedoyster Nov 8, 2024

PicoCentauri commented Nov 8, 2024 •

edited

Loading

frostedoyster commented Nov 8, 2024

PicoCentauri commented Nov 8, 2024 •

edited

Loading

frostedoyster left a comment

	def test_load_model_unknown_model():
	architecture_name = "experimental.pet"
	path = RESOURCES_PATH / "model-32-bit.ckpt"

	match = (
	f"path '{path}' is not a valid model file for the {architecture_name} "
	"architecture"
	)
	with pytest.raises(ValueError, match=match):
	load_model(path, architecture_name=architecture_name)

Allow exporting models from remote locations #352

Allow exporting models from remote locations #352

Conversation

PicoCentauri commented Oct 3, 2024 • edited Loading

Contributor (creator of pull-request) checklist

abmazitov commented Oct 3, 2024

PicoCentauri commented Oct 4, 2024

abmazitov commented Oct 4, 2024 • edited Loading

PicoCentauri commented Oct 7, 2024

PicoCentauri commented Oct 7, 2024

Luthaf commented Oct 7, 2024

PicoCentauri commented Oct 7, 2024

Luthaf commented Oct 8, 2024

PicoCentauri commented Oct 14, 2024

PicoCentauri commented Oct 24, 2024

frostedoyster commented Oct 28, 2024

PicoCentauri commented Nov 4, 2024

PicoCentauri commented Nov 5, 2024

frostedoyster commented Nov 6, 2024

Luthaf commented Nov 6, 2024

frostedoyster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PicoCentauri commented Nov 8, 2024 • edited Loading

frostedoyster commented Nov 8, 2024

PicoCentauri commented Nov 8, 2024 • edited Loading

frostedoyster left a comment

Choose a reason for hiding this comment

PicoCentauri commented Oct 3, 2024 •

edited

Loading

abmazitov commented Oct 4, 2024 •

edited

Loading

PicoCentauri commented Nov 8, 2024 •

edited

Loading

PicoCentauri commented Nov 8, 2024 •

edited

Loading