-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential for a memory leak when using Primary Context #305
Comments
What you describe is a real problem. And given the current CUDA API, it's difficult to do better than what's currently implemented in PyCUDA. The following are the constraints:
The only "bulletproof" solution that comes to mind is to maintain a per-context queue of "memory-yet-to-be-deallocated". I'd be open to considering patches implementing this if they are thoroughly tested. If you are able to migrate to a less stateful API (OpenCL?), this problem would disappear on its own. If you can ensure that no GPU memory gets freed outside your |
Thanks for the feedback @inducer I will look at ways to provide a PR. And I share your frustration with CUDA's context management. In the meantime, @elliotslaughter 's solution above is something that works for our needs (for now) I wonder if this would be simpler to fix in situations where there is ever only one context per GPU (i.e. the way that the CUDA runtime API does it with primary contexts). I.e. then GPU memory could always be freed by using the on-and-only context. Essentially this simpler model would clean up after users if they are only using the primary context, and leave the work of freeing memory up to them if they have fancier uses of multiple contexts. |
But if only one thread can have that context as its active one, how do you propose dealing with multiple threads? |
Ok, this is a good limitation to know about. Our system does use multiple threads, but we typically limit it so that at most one thread is running at a time. (There isn't really a point to allowing more, as the GIL will kill any parallel speedup anyway. When we really need parallelism, we use multiple processes.) If we can promise that exactly one thread will use the context at a time, is it ok to leave the context active in multiple threads? Or is that still bad? |
This post seems to indicate that, as of CUDA 4 (i.e. after PyCUDA's context management code was written), it became possible to have one context be current for multiple threads. The docs at https://docs.nvidia.com/cuda/cuda-driver-api/index.html still don't clearly reflect that AFAICT (or maybe I didn't look hard enough). I'd be happy to consider patches (if needed) that allow making use of that in PyCUDA. |
For what it's worth, asking around my NVIDIA contacts I was able to dig up a few more links:
|
#306 (recently filed) is related. |
I am working with a group that is using PyCUDA in a task-based parallel application. We therefore don't have a "main" thread that can "own" the PyCUDA primary context. Instead we found a solution where we push/pop the Primary Context on the context stack:
The problem with this approach is that any variables referring to GPU memory are still in scope when
ctx.pop()
is called. This appears to result in their not being free'd on the device, leaking memory. Explicitly callingdel
on these variables does not stop the memory leak.Our solution has been to create an execution-guard, enuring that all GPU variables are out of scope before
ctx.pop()
is called:@elliottslaughter gets credit for making this work.
The text was updated successfully, but these errors were encountered: