From b85d1c77609e38e9594b14220e3e9b3c1a78e313 Mon Sep 17 00:00:00 2001 From: Tim Besard Date: Mon, 6 Jan 2025 15:19:48 +0100 Subject: [PATCH] Add blog post on GPUArrays.jl v11. --- post/2025-01-07-gpuarrays-11.md | 82 +++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 post/2025-01-07-gpuarrays-11.md diff --git a/post/2025-01-07-gpuarrays-11.md b/post/2025-01-07-gpuarrays-11.md new file mode 100644 index 0000000..f7fb4f3 --- /dev/null +++ b/post/2025-01-07-gpuarrays-11.md @@ -0,0 +1,82 @@ ++++ +title = "GPUArrays v11: Port to KernelAbstractions.jl" +author = "Tim Besard" +abstract = """ + The latest version of GPUArrays.jl involved a port of all vendor-neutral + kernels to KernelAbstractions.jl. This should make it easier to add new + functionality and improve the performance of existing kernels.""" ++++ + +{{abstract}} + + +## Vendor-neutral kernel DSL + +Back in the day, we created GPUArrays.jl to avoid having to write separate +kernels for each GPU back-end, by relying on a very simple vendor-neutral +domain-specific language (DSL) that could be translated very easily to the +back-end's native kernel language. As a simple example, the following kernel was +used to compute the adjoint of a vector: + +```julia +function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector) + gpu_call(B, A) do ctx, B, A + idx = @linearidx A + @inbounds B[1, idx] = adjoint(A[idx]) + return + end + return B +end +``` + +This DSL was designed almost a decade ago, by [Simon +Danisch](https://github.com/SimonDanisch), and has served us well! Since then, +KernelAbstractions.jl has been developed by [Valentin +Churavy](https://github.com/vchuravy/), providing a more principled and powerful +DSL. With many application developers switching to KernelAbstractions.jl, it was +time to port GPUArrays.jl to this new DSL as well. + +Thanks to the tireless work by [James Schloss](https://github.com/leios), +**GPUArrays.jl v11 now uses KernelAbstractions.jl for all vendor-neutral +kernels**. The aforementioned `adjoint!` kernel now looks like this: + +```julia +function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector) + @kernel function adjoint_kernel!(B, A) + idx = @index(Global, Linear) + @inbounds B[1, idx] = adjoint(A[idx]) + end + adjoint_kernel!(get_backend(A))(B, A; ndrange=size(A)) + return B +end +``` + +As shown above, the KernelAbstractions.jl DSL is very similar to the old DSL, +but it provides more flexibility and power (e.g., support for atomics through +Atomix.jl). In addition, many more users are familiar with +KernelAbstractions.jl, making it easier for them to contribute to GPUArrays.jl. +A good first step here would be to port some of the vendor-specific kernels from +CUDA.jl to GPUArrays.jl, making them available to all GPU back-ends. If you are +interested in contributing, please reach out! + +That said, the change is not without its challenges. The added flexibility +offered by KernelAbstractions.jl with respect to indexing currently results in +**certain kernels being slower than before**, specifically when there is not +much computational complexity to amortise the cost of indexing (e.g., when doing +very simple broadcasts). [We are working on improving +this](https://github.com/JuliaGPU/GPUArrays.jl/issues/565), but it will take +some time. Not to hold back the rest of the JuliaGPU ecosystem, we are releasing +despite these performance issues. It's recommended to carefully benchmark your +application after upgrading to v11, and to report any performance regressions + + +## Back-end package versions + +As GPUArrays.jl is not a direct dependency of most applications, the update +will be pulled in by the following back-end package versions (some of which +may not be released yet): + +- CUDA.jl v5.6 +- Metal.jl v1.5 +- oneAPI.jl v2.0 +- AMDGPU.jl v1.1