Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple mistakes trigger unclear error messages in the ALBERT example #387

Open
1 of 2 tasks
borzunov opened this issue Sep 21, 2021 · 2 comments
Open
1 of 2 tasks
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@borzunov
Copy link
Member

borzunov commented Sep 21, 2021

Simple mistakes trigger unclear error messages in the ALBERT example, that is:

  • Absence of the unpacked data for trainer (currently triggers requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/data/tokenizer)
  • Running all peers in --client_mode (currently triggers AllReduce failed: could not find a group)

It would be great to show a clear error message in these cases.

@borzunov borzunov added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 21, 2021
borzunov added a commit that referenced this issue Dec 25, 2021
Resolves #431, the 1st issue from #387, and many other minor issues (see the PR's comments).
@soodoshll
Copy link
Contributor

Hi Alexander,
I find that running with only one regular trainer(+training monitor) also triggers the second problem. I guess it should skip the averaging when there's only one trainer?

@borzunov
Copy link
Member Author

borzunov commented Dec 28, 2021

Hi @soodoshll,

Thanks for the report! In this case, training proceeds normally, so you can proceed to connecting more peers despite the error.

To be honest, I was not sure if we should remove it, since hivemind is designed to be used with several trainers, and a trainer finding itself to be alone is rather a symptom of connection issues in real training runs.

However, I now agree that this message is confusing for people running our example for the first time, so we should remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants