Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'mteb/index_stackexchange_### model a: bm25'. #33

Open
Muennighoff opened this issue Aug 5, 2024 · 3 comments

Comments

@Muennighoff
Copy link
Contributor

Not sure what happened but saw this in the logs:

se.py", line 458, in result
2024-08-05 21:48:44 | ERROR | stderr |     return self.__get_result()
2024-08-05 21:48:44 | ERROR | stderr |   File "/env/lib/conda/gritkto/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2024-08-05 21:48:44 | ERROR | stderr |     raise self._exception
2024-08-05 21:48:44 | ERROR | stderr |   File "/env/lib/conda/gritkto/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-08-05 21:48:44 | ERROR | stderr |     result = self.fn(*self.args, **self.kwargs)
2024-08-05 21:48:44 | ERROR | stderr |   File "/data/niklas/arena/models.py", line 226, in retrieve
2024-08-05 21:48:44 | ERROR | stderr |     index = self.load_bm25_index(model_name, corpus)
2024-08-05 21:48:44 | ERROR | stderr |   File "/data/niklas/arena/models.py", line 164, in load_bm25_index
2024-08-05 21:48:44 | ERROR | stderr |     index.load_index()
2024-08-05 21:48:44 | ERROR | stderr |   File "/data/niklas/arena/retrieval/bm25_index.py", line 47, in load_index
2024-08-05 21:48:44 | ERROR | stderr |     self._create_index()
2024-08-05 21:48:44 | ERROR | stderr |   File "/data/niklas/arena/retrieval/bm25_index.py", line 35, in _create_index
2024-08-05 21:48:44 | ERROR | stderr |     retriever.save_to_hub(repo_id=f"mteb/{self.repo_name}", token=hf_token, corpus=passages)
2024-08-05 21:48:44 | ERROR | stderr |   File "/env/lib/conda/gritkto/lib/python3.10/site-packages/bm25s/hf.py", line 255, in save_to_hub
2024-08-05 21:48:44 | ERROR | stderr |     repo_url = api.create_repo(
2024-08-05 21:48:44 | ERROR | stderr |   File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
2024-08-05 21:48:44 | ERROR | stderr |     validate_repo_id(arg_value)
2024-08-05 21:48:44 | ERROR | stderr |   File "/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
2024-08-05 21:48:44 | ERROR | stderr |     raise HFValidationError(
2024-08-05 21:48:44 | ERROR | stderr | huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'mteb/index_stackexchange_### model a: bm25'.
@isaac-chung
Copy link
Contributor

Looking at where self.repo_name is defined:

self.repo_name = f"index_{corpus}_{model_name.lower()}"

Maybe model_name has spaces in it, which is not alphanumeric?

@Muennighoff
Copy link
Contributor Author

I think the problem is that sometimes the model name is turned into ### model a: bm25 rather than bm25 and this leads to this error; I'm not sure when exactly

@isaac-chung
Copy link
Contributor

Maybe we can directly feed bm25 as the model_name here?

    def retrieve(self, query, corpus, model_name, topk=1):
        corpus_format = CORPUS_TO_FORMAT[corpus]

        if "BM25" in model_name:
            index = self.load_bm25_index(model_name, corpus)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants