-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU memory allocated by make_context cannot be released when exception. #335
Comments
What result are you observing? What result are you expecting? The error handling in that code path looks RAII-safe, so it should do the right thing: Lines 854 to 863 in 9f3b898
|
|
It's not easy to explain. Scenario 2: Deliberately generate multiple exceptions and find that the GPU memory usage has increased. |
|
Throwing 10 exceptions, GPU memory occupies 390M. |
There may be a problem with the make_context method. prepare_context_switch may cause the context to be switched, but the GPU memory allocation is unsuccessful. Therefore, the pop of the previous context is unsuccessful, resulting in the cuCtxPopCurrent failed: invalid device context exception. |
CuCtxPopCurrent failed: invalid device context exception caused by unsuccessful pop of the previous context. So the GPU memory is not released? |
I suspect that |
It'll be a while before I have time to look into this. PRs welcome in the meantime! |
Describe the bug
I want to initialize as many cuda contexts as possible in a multi-threaded environment, but when cuda.Device(0).make_context() throws an exception, the GPU memory allocated by make_context cannot be released.
To Reproduce
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: