-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the Training Process of the Latest b28c512nbt Model #1015
Comments
Yes, the b28c512nbt network did not even exist at the beginning, the architecture wasn't invented yet, it was brought in later by being trained on data of other networks and then switched to. The graph you posted is the training history of the run. You can see clearly from the graph you posted that the network size was progressively changed, that's why the graph starts with b6c96, and then changes to other networks. Is there something that makes you not trust the graph? It's the official graph and if you already know about it, it answers your own question. |
Note that one thing that might be indeed confusing about one of the settings of the graph - "Upload time" is just the time the network was first created and uploaded to website for use for distributed self-play by contributors. However, the earliest half of networks were all uploaded at once because they were all trained offline, prior to crowdsourcing the data generation for KataGo, and the upload time is just when the website itself recorded them being uploaded. See the "g170" run at https://katagoarchive.org/index.html - the current ongoing crowdsourced "kata1" run is a continuation of the "g170" run which was the offline run whose networks were all uploaded at once at the start, these have the accurate data dumps and quite possibly still even the accurate file modification times for the individual training data files if you unzip them. Other than that, yes you can trust the dates and such of the files being recorded on the training site, those were the dates that things were uploaded and when networks were switched to for self-play. |
Thank you, David. Regarding your point that "it was brought in later by being trained on data of other networks and then switched to," I have a few more questions that I would like to clarify:
Thank you very much for your assistance! |
I would like to understand more about the training process of the KataGo models, specifically regarding the training method of the latest b28c512nbt model.
According to the KataGo paper, the training process appears to involve progressively increasing the network size, starting from smaller architectures like b6c96, then moving to b10c128, b15c192, and finally to b28c512nbt. This progressive approach allows the model to gradually learn more complex features while avoiding the computational cost and difficulty of training a very large network from the beginning.
However, I am uncertain if this is the actual training strategy used by KataGo. Therefore, I would like to confirm the following points:
Training Strategy: Was the b28c512nbt model of KataGo indeed trained by progressively increasing the network size from smaller networks? Or was it initialized and trained directly as the final large network?
Training Process: If KataGo adopted the progressive network size increase strategy, do the changes in network size, training duration, and data volume at each stage align with the information provided in the Approximate Elo Ratings Graph?
Thank you very much for your assistance!
The text was updated successfully, but these errors were encountered: