-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic attempts #282
Atomic attempts #282
Conversation
Two comments: I think the approach is the right one. Define the atomic primitives in KA and then expand them in each backend. We will need to copy the macro implementation from CUDA.jl (similar to I guess the big question is what to do on 1.6, but let me worry about that. For now just add a version check and an error for 1.6 |
Alright, I'll get to it with the other atomic functions. My goal is to have them done by the end of the week. |
Ok, the last ones are atomic inc: These just increment up to or decrement down to a specified value. I guess they are mainly for modular arithmetic (otherwise, why would they set the old val to 0 or val)? Anyway. I think I made some good progress today. Will start on the tests tomorrow! |
CUDA:
KA
I am an idiot. Fixed this. |
All GPU tests pass, but the compare-and-swap on the CPU is still failing with |
Ok, all tests on the CPU and GPU now pass. They are mostly direct copies of the tests from CUDA.jl. I need to:
|
Rebase on master? |
26361df
to
50481d9
Compare
Pushed and also changed the histogram test. I think the big problem is that atomics are broken on anything < 1.7 Is there a way to exclude tests / examples and also prevent the exporting of atomics for < 1.7? |
I added an error for running atomics on the CPU for julia versions less than 1.7.0 and got most of the tests to pass. For some reason, some of the CI is failing after the printing tests, but I don't think that has to do with this PR? One caveat here is that I needed to remove some architecture-based precision tests, because the only way I could specify them in the test file was by relying on CUDA, which caused some CPU tests to fail. For example, certain cards cannot do the float64 atomic add, but these are tested in CUDA.jl, so I think it's probably fine since we are calling the same functions. |
adding GC checks
dfb5550
to
c5fa0c7
Compare
Thanks for pointing me to this PR @vchuravy ! We would have an immediate application for this: We're implementing monotonous splines for normalizing flows based on KernelAbstractions. We have a running prototype, but defining the CC @VasylHafych, @Micki-D |
For the record, I am currently using this branch to do atomic operations for one of my own projects, so feel free to work off of it in the short-term. In the long-term, a few other people are working on better atomic support in CUDA. Namely: JuliaLLVM/LLVM.jl#308. The current plan is to create a separate package (UnsafeAtomicsLLVM.jl), which we can load directly into KernelAbstractions for atomic support. This PR will hopefully reflect these changes when they happen |
With JuliaConcurrent/UnsafeAtomicsLLVM.jl#3 we can use atomics on CPU and GPU using the same interface. Once UnsafeAtomicsLLVM.jl is released (requires LLVM.jl 4.12), the ecosystem surrounding KernelAbstractions.jl would look like something like graph TD;
UnsafeAtomics.jl --> Atomix.jl;
UnsafeAtomics.jl --> UnsafeAtomicsLLVM.jl;
Atomix.jl --> KernelAbstractions.jl;
KernelAbstractions.jl --> CUDAKernels.jl;
LLVM.jl --> UnsafeAtomicsLLVM.jl;
LLVM.jl --> CUDA.jl;
CUDA.jl --> CUDAKernels.jl;
UnsafeAtomicsLLVM.jl --> CUDAKernels.jl;
KernelAbstractions.jl --> user[User code]
CUDAKernels.jl --> user[User code]
where
|
We'll give it a try!
That'll be awesome ... |
@tfk, I saw a UnsafeAtomicsLLVM 0.1.0 is out now - can that be used (on Julia v1.8) in a KernelAbstractions kernel already? |
@leios I think we can close this now? |
Yeah, even if we rework it, I think it needs to be in a different PR |
Does that mean UnsafeAtomicsLLVM is "ready", basically? |
Yeah, basically. There might still be some stuff to sort out, but #299 added the dependency on UnsafeAtomicsLLVM and is working in master (though we are missing a few tests and docs). |
Neat, thanks @leios ! Is there an example or so, to get started? |
Right now, it's just the histogram example: https://github.com/JuliaGPU/KernelAbstractions.jl/blob/master/examples/histogram.jl; however, #299 has someone commenting about using it for an We need better examples and docs... |
Thanks! So it's basically just |
Ah, no. That was just a micro-optimization (and a proof that it works on shared memory). Any |
Thanks a lot @leios! @VasylHafych and me will give it a go for our scattered gradient accumulation use case. |
This is a draft of an atomic update to Kernelabstractions.
I plan to put everything we need in the atomics.jl file (and corresponding CUDAKernels file); however, I cannot really test ROCM, so I might need to leave that to someone else.
Current roadmap (to be worked on throughout the week):
CUDA.atomic_*
calls. Docs here: https://cuda.juliagpu.org/stable/api/kernel/#Atomics. Right now, I am directly calling the CUDA atomics and then using atomics on pointers for the CPU. I think this will work for all primitives, but might be wrong. Info about CPU atomics: https://gist.github.com/vtjnash/11b0031f2e2a66c9c24d33e810b34ec0#new-intrinsics-for-ptrt@atomic
macro (along with docs, etc). This is on a separate point because I might not do it for this PR. On the GPU, I think we can pull the@atomic
macro directly, but no such feature exists on the CPU (so far as I am aware). We can look for inspiration for the CPU implementation from the CUDA@atomic
macro definition: https://github.com/JuliaGPU/CUDA.jl/blob/master/src/device/intrinsics/atomics.jl, but that is tagged as "experimental" for now.I am actually currently struggling with the final point because for some reason the macro I created (
KernelAbstractions.@atomic
) is only grabbing the first symbol of an expression and not the full expression. If everyone is happy enough with the atomic primitives, I might decide to leave the macro to future work (tm).This is a step towards finalizing #7 and #276; however, I am not sure if it fixes them completely without the
@atomic
macro.