-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to precompile cuRand and gpu array functions #311
Comments
I suspect the most fruitful approach would be to modify the kernel caching layer to support this. Maybe allow setting a mode where all used kernels can be "recorded". (This would have to happen at context creation time, otherwise some kernels may already be loaded and might get missed.) This recording would then generate the appropriate fat binary files that can be shipped with an application, likely stored as a cache (which would have to be free of collisions) based on the provided source code. IMO, this would allow for minimal interface changes on the application side while avoiding a hard dependency on the compiler at runtime. You could also revive this NVRTC patch set and base your work on that, then you'd only need to save PTX. (though, to be fair, the generated PTX might/will still vary by architecture) |
Thanks for the quick response! I will look into setting up a kernel caching layer. Ideally it would also work the same for custom kernels compiled with [edit: okay, nevermind about the cache 'busting' logic. The important functionality is less of a cache and more just being able to record and store binaries for all kernels compiled by a given application.] Is this something you'd be interested in accepting as a PR? (NVRTC looks interesting, but still requires users to have nvcc installed, which makes it not ideal for me) |
Hi,
I want to remove the requirement to have MSVC and NVCC compilers available in the runtime environment so I can distribute a program I'm writing in pycuda. I've managed to compile my custom kernels into
.fatbin
files and import them usingmodule_from_buffer
.However, it looks like some other pycuda functions still rely on generating and compiling cuda kernels at runtime. Specifically I'm having trouble with the
cu_rand
integration, as well asgpu_array.fill(x)
function. Presumably a lot more of the gpu_array helper functions will have the same problem.Is there a way to package the kernels used by these functions into .fatbin files, and to rely on those files rather than runtime compilation? and/or what code changes would be required to pycuda to support this?
Thanks!
The text was updated successfully, but these errors were encountered: