-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
一台服务器三个GPU跑bert模型时,总会突然崩掉 #13
Comments
gpu显存是多大呢?显存过载的话可以调小max_seq_length和train_batch_size,我的经验是单块的8G gpu可以跑max_legnth=128和batch_size=24的。三块gpu应该是绰绰有余的,建议观察nvidia-smi的gpu使用状态,对应调整就好啦。 |
我试了调小max_seq_length=32,64和train_batch_size=24,16,8,之类的,但是不管用。 |
我规定了一块GPU跑,现在的看着是成功了。谢谢您啦~ |
请问楼上跑了多久啊 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我觉得可能是batch_size太大,但是调小也不管用。会不会是负载不均衡。有没有一些建议或解决方法。
The text was updated successfully, but these errors were encountered: