Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post on GPUArrays.jl v11. #48

Merged
merged 1 commit into from
Jan 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions post/2025-01-07-gpuarrays-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
+++
title = "GPUArrays v11: Port to KernelAbstractions.jl"
author = "Tim Besard"
abstract = """
The latest version of GPUArrays.jl involved a port of all vendor-neutral
kernels to KernelAbstractions.jl. This should make it easier to add new
functionality and improve the performance of existing kernels."""
+++

{{abstract}}


## Vendor-neutral kernel DSL

Back in the day, we created GPUArrays.jl to avoid having to write separate
kernels for each GPU back-end, by relying on a very simple vendor-neutral
domain-specific language (DSL) that could be translated very easily to the
back-end's native kernel language. As a simple example, the following kernel was
used to compute the adjoint of a vector:

```julia
function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector)
gpu_call(B, A) do ctx, B, A
idx = @linearidx A
@inbounds B[1, idx] = adjoint(A[idx])
return
end
return B
end
```

This DSL was designed almost a decade ago, by [Simon
Danisch](https://github.com/SimonDanisch), and has served us well! Since then,
KernelAbstractions.jl has been developed by [Valentin
Churavy](https://github.com/vchuravy/), providing a more principled and powerful
DSL. With many application developers switching to KernelAbstractions.jl, it was
time to port GPUArrays.jl to this new DSL as well.

Thanks to the tireless work by [James Schloss](https://github.com/leios),
**GPUArrays.jl v11 now uses KernelAbstractions.jl for all vendor-neutral
kernels**. The aforementioned `adjoint!` kernel now looks like this:

```julia
function LinearAlgebra.adjoint!(B::AbstractGPUMatrix, A::AbstractGPUVector)
@kernel function adjoint_kernel!(B, A)
idx = @index(Global, Linear)
@inbounds B[1, idx] = adjoint(A[idx])
end
adjoint_kernel!(get_backend(A))(B, A; ndrange=size(A))
return B
end
```

As shown above, the KernelAbstractions.jl DSL is very similar to the old DSL,
but it provides more flexibility and power (e.g., support for atomics through
Atomix.jl). In addition, many more users are familiar with
KernelAbstractions.jl, making it easier for them to contribute to GPUArrays.jl.
A good first step here would be to port some of the vendor-specific kernels from
CUDA.jl to GPUArrays.jl, making them available to all GPU back-ends. If you are
interested in contributing, please reach out!

That said, the change is not without its challenges. The added flexibility
offered by KernelAbstractions.jl with respect to indexing currently results in
**certain kernels being slower than before**, specifically when there is not
much computational complexity to amortise the cost of indexing (e.g., when doing
very simple broadcasts). [We are working on improving
this](https://github.com/JuliaGPU/GPUArrays.jl/issues/565), but it will take
some time. Not to hold back the rest of the JuliaGPU ecosystem, we are releasing
despite these performance issues. It's recommended to carefully benchmark your
application after upgrading to v11, and to report any performance regressions


## Back-end package versions

As GPUArrays.jl is not a direct dependency of most applications, the update
will be pulled in by the following back-end package versions (some of which
may not be released yet):

- CUDA.jl v5.6
- Metal.jl v1.5
- oneAPI.jl v2.0
- AMDGPU.jl v1.1
Loading