You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CUDA implementation is amazing, but for systems that don't have Nvidia GPU's are there any plans of making an (expectably much slower) more general implementation?
The text was updated successfully, but these errors were encountered:
To do that well would be a massive undertaking. The shortcut approach is to replace the CPU's matrix operations with OpenCL accelerated versions which in theory might require minimal effort. I plan at some point to do this to take advantage of Apple's Neural Engine and can look into doing it for OpenCL as well, but this isn't a high priority. Contributions are always welcome.
The CUDA implementation is amazing, but for systems that don't have Nvidia GPU's are there any plans of making an (expectably much slower) more general implementation?
The text was updated successfully, but these errors were encountered: