JuliaGPU · maleadt · Jan 6, 2025 · Jan 6, 2025
diff --git a/post/2025-01-07-gpuarrays-11.md b/post/2025-01-07-gpuarrays-11.md
@@ -0,0 +1,82 @@
++++
+title = "GPUArrays v11: Port to KernelAbstractions.jl"
+author = "Tim Besard"
+abstract = """
+  The latest version of GPUArrays.jl involved a port of all vendor-neutral
+  kernels to KernelAbstractions.jl. This should make it easier to add new
+  functionality and improve the performance of existing kernels."""
++++
+
+{{abstract}}
+
+
+## Vendor-neutral kernel DSL
+
+Back in the day, we created GPUArrays.jl to avoid having to write separate
+kernels for each GPU back-end, by relying on a very simple vendor-neutral
+domain-specific language (DSL) that could be translated very easily to the
+back-end's native kernel language. As a simple example, the following kernel was
+used to compute the adjoint of a vector:
+
+```julia
+function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector)
+    gpu_call(B, A) do ctx, B, A
+        idx = @linearidx A
+        @inbounds B[1, idx] = adjoint(A[idx])
+        return
+    end
+    return B
+end
+```
+
+This DSL was designed almost a decade ago, by [Simon
+Danisch](https://github.com/SimonDanisch), and has served us well! Since then,
+KernelAbstractions.jl has been developed by [Valentin
+Churavy](https://github.com/vchuravy/), providing a more principled and powerful
+DSL. With many application developers switching to KernelAbstractions.jl, it was
+time to port GPUArrays.jl to this new DSL as well.
+
+Thanks to the tireless work by [James Schloss](https://github.com/leios),
+**GPUArrays.jl v11 now uses KernelAbstractions.jl for all vendor-neutral
+kernels**. The aforementioned `adjoint!` kernel now looks like this:
+
+```julia
+function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector)
+    @kernel function adjoint_kernel!(B, A)
+        idx = @index(Global, Linear)
+        @inbounds B[1, idx] = adjoint(A[idx])
+    end
+    adjoint_kernel!(get_backend(A))(B, A; ndrange=size(A))
+    return B
+end
+```
+
+As shown above, the KernelAbstractions.jl DSL is very similar to the old DSL,
+but it provides more flexibility and power (e.g., support for atomics through
+Atomix.jl). In addition, many more users are familiar with
+KernelAbstractions.jl, making it easier for them to contribute to GPUArrays.jl.
+A good first step here would be to port some of the vendor-specific kernels from
+CUDA.jl to GPUArrays.jl, making them available to all GPU back-ends. If you are
+interested in contributing, please reach out!
+
+That said, the change is not without its challenges. The added flexibility
+offered by KernelAbstractions.jl with respect to indexing currently results in
+**certain kernels being slower than before**, specifically when there is not
+much computational complexity to amortise the cost of indexing (e.g., when doing
+very simple broadcasts). [We are working on improving
+this](https://github.com/JuliaGPU/GPUArrays.jl/issues/565), but it will take
+some time. Not to hold back the rest of the JuliaGPU ecosystem, we are releasing
+despite these performance issues. It's recommended to carefully benchmark your
+application after upgrading to v11, and to report any performance regressions
+
+
+## Back-end package versions
+
+As GPUArrays.jl is not a direct dependency of most applications, the update
+will be pulled in by the following back-end package versions (some of which
+may not be released yet):
+
+- CUDA.jl v5.6
+- Metal.jl v1.5
+- oneAPI.jl v2.0
+- AMDGPU.jl v1.1