Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download from HuggingFace with private token #390

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

frostedoyster
Copy link
Collaborator

@frostedoyster frostedoyster commented Nov 13, 2024

As in the title

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

📚 Documentation preview 📚: https://metatrain--390.org.readthedocs.build/en/390/

@abmazitov
Copy link
Contributor

I managed to adapt the PET.load_checkpoint() method, so now it works with the HuggingFace checkpoint that we have. However the mtt export API is still not well-adapted. I keep getting the following error:

Traceback (most recent call last):
  File "/home/mazitov/apps/anaconda3/envs/metatrain/lib/python3.11/site-packages/metatrain/__main__.py", line 105, in main
    export_model(**args.__dict__)
TypeError: export_model() got an unexpected keyword argument 'huggingface_api_token'

which is related to the fact that in the load_model function, we pop the 'huggingface_api_token' key from kwargs, which is the copy of the args.__dict__, so the original dict still has the unexpected key.


Even though loading the model explicitly worked for me, it only worked while loading from the abmazitov/pet-mad and not from the metatensor/pet-mad. Maybe I did something wrong...

@frostedoyster frostedoyster marked this pull request as ready for review November 14, 2024 10:31
Comment on lines +49 to +50
(Downloading private models is also supported, only through HuggingFace and using the
``--huggingface_api_token`` flag.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Downloading private models is also supported, only through HuggingFace and using the
``--huggingface_api_token`` flag.)
Downloading private HuggingFace models is also supported, by specifying the corresponding API token with the ``--huggingface_api_token`` flag.


if urlparse(str(path)).scheme:
# Download from HuggingFace with a private token
if kwargs.get("huggingface_api_token"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we/can we had a test for this on CI? Maybe creating a cosmo account on HuggingFace & uploading a dummy model+checkpoint there.

Comment on lines +137 to +138
# make sure to copy the checkpoint to the current directory
shutil.copy(path, Path.cwd() / str(path).split("/")[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand why this is required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you want the file to be directly downloaded to the current directory you can do

path, _ = urlretrieve(
    url=str(path),
    filename=str(Path.cwd() / str(path).split("/")[-1]))

See https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve.

@Luthaf
Copy link
Member

Luthaf commented Nov 14, 2024

I like this! Down the line, we could maybe add a way to read the API token from env variables or a file; and use such mechanism for all external integrations (W&B seems like a very logical one to add next).

@PicoCentauri
Copy link
Contributor

If you have a private model on Hugging Face, there are effective third-party tools like wget, curl, or the Hugging Face web interface to download it. Adding a private API interface introduces overhead and complexity to the codebase, and for a private Hugging Face repo, this doesn’t provide significant benefits, as there are accessible alternatives. The presented workaround also introduces an additional dependency, and I’m concerned it will require maintenance soon. We’re moving around URL schemas, and, to my knowledge, Hugging Face doesn’t have a stable schema, right?

As for W&B, it’s a different story, and I fully support adding a functionality, because users can’t access the benefits of W&B without us integrating it directly into the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants