You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simple mistakes trigger unclear error messages in the ALBERT example, that is:
Absence of the unpacked data for trainer (currently triggers requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/data/tokenizer)
Running all peers in --client_mode (currently triggers AllReduce failed: could not find a group)
It would be great to show a clear error message in these cases.
The text was updated successfully, but these errors were encountered:
Hi Alexander,
I find that running with only one regular trainer(+training monitor) also triggers the second problem. I guess it should skip the averaging when there's only one trainer?
Thanks for the report! In this case, training proceeds normally, so you can proceed to connecting more peers despite the error.
To be honest, I was not sure if we should remove it, since hivemind is designed to be used with several trainers, and a trainer finding itself to be alone is rather a symptom of connection issues in real training runs.
However, I now agree that this message is confusing for people running our example for the first time, so we should remove it.
Simple mistakes trigger unclear error messages in the ALBERT example, that is:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/data/tokenizer
)--client_mode
(currently triggersAllReduce failed: could not find a group
)It would be great to show a clear error message in these cases.
The text was updated successfully, but these errors were encountered: