You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When stopping training in the GUI (Stop Training button) with SD1.5 fine-tuning, GPU memory does not get cleared. So next run OOM-s
Tested on e895ddc, where it does successfully clear.
I'm running CUDA 12.4 on Ubuntu, please let me know what more information would be helpful
If logs are needed, I'll also be happy to provide them, but haven't seen anything out of the ordinary. Just the OneTrainer process taking around the same VRAM as while training.
Only thing that helps, is completely restarting OneTrainer.
What did you expect would happen?
Succesfull gc after stopping the run, like in earlier versions. My guess is def torch_gc() fails on torch 2.5.x or with CUDA 12.4, but I'm not too familiar with it to troubleshoot further.
What I tried without success:
calling del on model, optimizer, calling torch_gc() after the model has been saved.
Edit your post to include your config.json please, (just ctrl + f replace your username). This looks like a known bug that occurs when using offload, but I’m not sure why you would be using that with sd 1.5, so I want to make sure.
Edit your post to include your config.json please, (just ctrl + f replace your username). This looks like a known bug that occurs when using offload, but I’m not sure why you would be using that with sd 1.5, so I want to make sure.
Added my config.
Relevant variables from what I see:
As you noticed, with SD1.5 I didn't have an intent to offload, but I do not rule out I accidentally turned them on, I don't even see a toggle for offloading in the GUI.👀
I will try setting these two variables to false in the config and test again with the latest version once my current training run is done.
What happened?
When stopping training in the GUI (Stop Training button) with SD1.5 fine-tuning, GPU memory does not get cleared. So next run OOM-s
Tested on
e895ddc
, where it does successfully clear.I'm running CUDA 12.4 on Ubuntu, please let me know what more information would be helpful
If logs are needed, I'll also be happy to provide them, but haven't seen anything out of the ordinary. Just the OneTrainer process taking around the same VRAM as while training.
Only thing that helps, is completely restarting OneTrainer.
What did you expect would happen?
Succesfull gc after stopping the run, like in earlier versions. My guess is
def torch_gc()
fails on torch 2.5.x or with CUDA 12.4, but I'm not too familiar with it to troubleshoot further.What I tried without success:
calling del on model, optimizer, calling torch_gc() after the model has been saved.
Output of
pip freeze
1 to 1 with requirements-cuda.txt, as I'm running on a clean VM
my config.json
config.json
The text was updated successfully, but these errors were encountered: