-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Freezes Before Starting #2
Comments
Hi @yukiarimo. What you shared are pretty standard logs, so they do not really provide any context into what might be your issue. I have not tested this repo on MPS, rather only NVIDIA GPUs or CPUs, so I would start there (i.e. remove the |
Im getting the same issue on an RTX 4090. It just stops at:
and nothing happens |
Hi @dillfrescott. This is a hard problem to diagnose with little info. To begin narrowing down the possibilities, it would be worthwhile trying to get another pytorch lightning model to train from a different repo. Also, you should probably check to make sure your dependencies are aligned, as that can create weird issues sometimes. I wish I could offer more insight, but it's hard to tell without working with your setup. |
@dillfrescott @crlandsc It seems like there is some problem related to multithreading. I solved the freezing issue by setting the num_workers as 1. |
@kimbring2 Good find! It may be a versioning thing then. Multithreading used to work when I trained the original models, but I have had issues with num_workers in other more recent projects where I have used lightning too. |
The text was updated successfully, but these errors were encountered: