-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: infer plan type from x
type
#100
Comments
No, your proposal is clever. One argument against it would be that in practice the nodes will not neccecary need to be copied to the GPU. Our current implementation in CuNFFT actually precomputes the (sparse) convolution matrix on the CPU and then copies it to the GPU.
Thanks for the trigger, will do later. The real issue is that we don't have CI for that and I therefore need to manually run the tests on a dedicated computer.
That is fixable by having a method that In general: This GPU/CPU dispatching has not seen any real world usage. For instance in MRIReco.jl it is not yet possible to use GPUs without hacking the source code. So any real-world testing (e.g. in MIRT) would be appreciated. |
ping @migrosser |
Bummer. I would offer to do the test on my GPU machine if this were a single package, but I don't really know how to do such test properly for a repo with multiple nested packages. I used
Yep, I am working on it. In fact this suggestion originated from a user reported issue where the user clearly thought that having the nodes on the GPU would suffice to invoke CUDA (and so did I initially until I looked into it more): JeffFessler/mirt-demo#5 |
One needs to dev NFFT and CuNFFT (and probably also AbstractNFFTs) and then do the testing. But they were commented out. I re-enabled them |
By the way, CuNFFT is not of the same quality as its CPU implementation. We use the sparse matrix trick since it is so simple to bring everything on the GPU. As far as I have seen, this is that fastest GPU implementation: |
Hello! We've recently made some progress with the points mentioned in this issue. First of we've moved away from a dedicated CUDA package and instead implemented a GPU-vendor-agnostic NFFT-Plan. This plan is realized as a package extension and does not depend on CUDA. Additionally, this also (partially) works now on AMDGPUs and in general any AbstractGPUArray that implements an FFT. This means upstream packages don't need to specifically load CuNFFT anymore. In order to implement that I used Adapt.jl to move the data to the given GPU type. That package essentially implements I did not touch the This is a problem we also ran into in an upstream package, where I solved this issue by stripping the parameters from the type: https://github.com/JuliaImageRecon/LinearOperatorCollection.jl/blob/08c3ff7566da68268592f25184337d1001c1e2be/ext/LinearOperatorNFFTExt/NFFTOp.jl#L44 So while there is no public API to strip these parameters atm, we could use this workaround to implement |
Oh and we also now have CI for both CUDA and AMDGPUs |
@nHackel: yes, it would be good to rethink the interface based on all the experience that you have made in the last time. |
If I understand correctly, the current approach is that user decides between cpu and gpu versions by either
p = plan_nfft(x, N)
or (currently equivalently I think)p = plan_nfft(Array, x, N)
p = plan_nfft(CuArray, x, N)
I'd prefer that the default plan type come from the type of
x
by adding a method something likeplan_nfft(x, N) = plan_nfft(typeof(x), x, N)
So if
x
is aCuArray
then by default the plan will be a GPU plan.The reason is that then I can embed
plan_nfft
into downstream packages without forcing them to depend on CUDA.jl.This might also "future proof" it so that someday when we have a OpenCLArray version then again it can inherit from
x
.But I might be missing something?
And I have not really been able to test it out yet because I think the recent CUDA fixes are waiting for release updates.
I realize that if
x
is anAdjoint
type (or a Range) then this would need additional work to get at the "base" of such types.The text was updated successfully, but these errors were encountered: