-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA out of memory. #101
Comments
我也遇到了这个问题,请问解决了吗? |
|
我使用的数据量不是很大 |
那就不清楚了,可能需要作者解决一下 |
貌似EVA对显存要求更高。 |
修改num_workers为1就好了 |
您好,您所使用的GPU显存可能有点小。碰到比较长的序列的话有可能因为要记录的激活太多导致OOM。您可以考虑限定一下训练过程中的最长序列长度,或者换一个大一点显存的显卡。 |
num_workers 是 pytorch中DataLoader的参数,用来控制用多少个CPU进程来加载数据,这个数值的大小不会影响模型显存的占用的。 |
你好,请问如何缩小epoch呢?我在train.py中将--n_epochs改为1,为啥运行的时候还是这么大呢? |
这个应该是修改batchsize吧 |
tesla v100 上跑 一样out of memory.穷diaosi还是不要用了 |
python train.py --pretrained --model_checkpoint thu-coai/CDial-GPT_LCCC-large --data_path data/STC.json --scheduler linear。
你好请问我的内存明明是够的,它为啥还报这个错误呢。batch_size我也改成了1.
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.73 GiB total capacity; 904.23 MiB already allocated; 26.38 MiB free; 1020.00 MiB reserved in total by PyTorch)
Epoch: [63/4391266] 0%| , loss=0.0535, lr=5e-5 [00:09<174:20:29
每次到63就结束了,请问4391266代表什么意思呢?可以缩小这个数值吗
The text was updated successfully, but these errors were encountered: