We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Albert能否在训练时减少显存占用? 比如,假如两个网络同样有6个注意力模块,第一个网络没有参数共享,第二个网络在所有模块间都进行了参数贡献,那么在训练时两个模型显存占用会不会有特别明显差别? 换句话讲,Albert的优点只是减小了模型的size吗?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Albert能否在训练时减少显存占用?
比如,假如两个网络同样有6个注意力模块,第一个网络没有参数共享,第二个网络在所有模块间都进行了参数贡献,那么在训练时两个模型显存占用会不会有特别明显差别?
换句话讲,Albert的优点只是减小了模型的size吗?
The text was updated successfully, but these errors were encountered: