Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: infer plan type from x type #100

Open
JeffFessler opened this issue Jun 22, 2022 · 8 comments · May be fixed by #142
Open

Suggestion: infer plan type from x type #100

JeffFessler opened this issue Jun 22, 2022 · 8 comments · May be fixed by #142

Comments

@JeffFessler
Copy link
Collaborator

If I understand correctly, the current approach is that user decides between cpu and gpu versions by either

  • CPU: p = plan_nfft(x, N) or (currently equivalently I think) p = plan_nfft(Array, x, N)
  • GPU: p = plan_nfft(CuArray, x, N)

I'd prefer that the default plan type come from the type of x by adding a method something like
plan_nfft(x, N) = plan_nfft(typeof(x), x, N)

So if x is a CuArray then by default the plan will be a GPU plan.

The reason is that then I can embed plan_nfft into downstream packages without forcing them to depend on CUDA.jl.
This might also "future proof" it so that someday when we have a OpenCLArray version then again it can inherit from x.
But I might be missing something?
And I have not really been able to test it out yet because I think the recent CUDA fixes are waiting for release updates.

I realize that if x is an Adjoint type (or a Range) then this would need additional work to get at the "base" of such types.

@tknopp
Copy link
Member

tknopp commented Jun 22, 2022

But I might be missing something?

No, your proposal is clever. One argument against it would be that in practice the nodes will not neccecary need to be copied to the GPU. Our current implementation in CuNFFT actually precomputes the (sparse) convolution matrix on the CPU and then copies it to the GPU.

And I have not really been able to test it out yet because I think the recent CUDA fixes are waiting for release updates.

Thanks for the trigger, will do later. The real issue is that we don't have CI for that and I therefore need to manually run the tests on a dedicated computer.

I realize that if x is an Adjoint type (or a Range) then this would need additional work to get at the "base" of such types.

That is fixable by having a method that collects the nodes first and afterwards makes the GPU/CPU dispatch. We currently do this here https://github.com/JuliaMath/NFFT.jl/blob/master/AbstractNFFTs/src/derived.jl#L28. But the order of the methods would need to change. Will have a look at that.

In general: This GPU/CPU dispatching has not seen any real world usage. For instance in MRIReco.jl it is not yet possible to use GPUs without hacking the source code. So any real-world testing (e.g. in MIRT) would be appreciated.

@tknopp
Copy link
Member

tknopp commented Jun 22, 2022

ping @migrosser

@JeffFessler
Copy link
Collaborator Author

The real issue is that we don't have CI for that

Bummer. I would offer to do the test on my GPU machine if this were a single package, but I don't really know how to do such test properly for a repo with multiple nested packages. I used ]add with the #master branch of CuNFFT and NFFT and then did test NFFT. I got one package dependency error but still the tests ran and everything seemed to run except for one ERROR: LoadError: UndefVarError: libfinufft not defined which I suspect is unrelated to #97. So it seems OK, but I do not have full confidence of my test given the package nesting...

any real-world testing (e.g. in MIRT) would be appreciated.

Yep, I am working on it. In fact this suggestion originated from a user reported issue where the user clearly thought that having the nodes on the GPU would suffice to invoke CUDA (and so did I initially until I looked into it more): JeffFessler/mirt-demo#5

@tknopp
Copy link
Member

tknopp commented Jun 24, 2022

One needs to dev NFFT and CuNFFT (and probably also AbstractNFFTs) and then do the testing. But they were commented out. I re-enabled them
https://github.com/JuliaMath/NFFT.jl/blob/master/test/runtests.jl#L15
And I also tested on my GPU system (works!) and made a release. So right now CuNFFT should work.

@tknopp
Copy link
Member

tknopp commented Jun 24, 2022

By the way, CuNFFT is not of the same quality as its CPU implementation. We use the sparse matrix trick since it is so simple to bring everything on the GPU. As far as I have seen, this is that fastest GPU implementation:
https://github.com/flatironinstitute/cufinufft
It would be very interesting how far one can get with a pure Julia code. I have no experience in that direction until now. If there is interest in a competitive CuNFFT.jl implementation we should probably create a dedicated issue and discuss there. Other idea would be to just create wrapper around cufinufft for the moment.

@nHackel
Copy link
Collaborator

nHackel commented Jul 4, 2024

Hello! We've recently made some progress with the points mentioned in this issue.

First of we've moved away from a dedicated CUDA package and instead implemented a GPU-vendor-agnostic NFFT-Plan. This plan is realized as a package extension and does not depend on CUDA. Additionally, this also (partially) works now on AMDGPUs and in general any AbstractGPUArray that implements an FFT. This means upstream packages don't need to specifically load CuNFFT anymore.

In order to implement that I used Adapt.jl to move the data to the given GPU type. That package essentially implements convert for "wrapped" types such as adjoint or a sparse array.

I did not touch the plan_nfft interface yet, so one still has to do either plan_nfft(CuArray, ...) or plan_nfft(ROCArray, ...). One issue is that we can't directly use the result of typeof(x) in adapt. The function wants to get the basetype, so instead of CuArray{Float32, 2, ...} it wants CuArray.

This is a problem we also ran into in an upstream package, where I solved this issue by stripping the parameters from the type: https://github.com/JuliaImageRecon/LinearOperatorCollection.jl/blob/08c3ff7566da68268592f25184337d1001c1e2be/ext/LinearOperatorNFFTExt/NFFTOp.jl#L44

So while there is no public API to strip these parameters atm, we could use this workaround to implement plan_nfft(x, ...)

@nHackel
Copy link
Collaborator

nHackel commented Jul 4, 2024

Oh and we also now have CI for both CUDA and AMDGPUs

@tknopp
Copy link
Member

tknopp commented Jul 4, 2024

@nHackel: yes, it would be good to rethink the interface based on all the experience that you have made in the last time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants