New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

训练时显存占用问题 #66

Open

Qiu-dot opened this issue Jun 18, 2021 · 0 comments

Qiu-dot commented Jun 18, 2021

Albert能否在训练时减少显存占用？
比如，假如两个网络同样有6个注意力模块，第一个网络没有参数共享，第二个网络在所有模块间都进行了参数贡献，那么在训练时两个模型显存占用会不会有特别明显差别？
换句话讲，Albert的优点只是减小了模型的size吗？

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment