Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a CPU backend using POCL #556

Draft
wants to merge 12 commits into
base: vc/barriers
Choose a base branch
from
Draft

Implement a CPU backend using POCL #556

wants to merge 12 commits into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jan 15, 2025

TODO:

  • Assert that POCL supports SVM system
  • Support gpu_report_exception

Copy link
Contributor

github-actions bot commented Jan 15, 2025

Benchmark Results

main 1961a4b... main/1961a4bbb920db...
saxpy/default/Float16/1024 0.739 ± 0.0074 μs 0.749 ± 0.0078 μs 0.986
saxpy/default/Float16/1048576 0.173 ± 0.0066 ms 0.178 ± 0.01 ms 0.97
saxpy/default/Float16/16384 3.34 ± 0.021 μs 3.35 ± 0.034 μs 0.998
saxpy/default/Float16/2048 0.914 ± 0.0096 μs 0.929 ± 0.012 μs 0.984
saxpy/default/Float16/256 0.593 ± 0.0055 μs 0.601 ± 0.0063 μs 0.986
saxpy/default/Float16/262144 0.0441 ± 0.00064 ms 0.0448 ± 0.0017 ms 0.984
saxpy/default/Float16/32768 6.01 ± 0.044 μs 6.02 ± 0.059 μs 0.997
saxpy/default/Float16/4096 1.31 ± 0.027 μs 1.31 ± 0.025 μs 0.999
saxpy/default/Float16/512 0.651 ± 0.0062 μs 0.662 ± 0.0068 μs 0.983
saxpy/default/Float16/64 0.558 ± 0.0046 μs 0.57 ± 0.0064 μs 0.979
saxpy/default/Float16/65536 11.6 ± 0.09 μs 11.7 ± 0.2 μs 0.992
saxpy/default/Float32/1024 0.638 ± 0.011 μs 0.637 ± 0.0096 μs 1
saxpy/default/Float32/1048576 0.232 ± 0.018 ms 0.215 ± 0.033 ms 1.08
saxpy/default/Float32/16384 2.79 ± 0.15 μs 2.82 ± 0.24 μs 0.99
saxpy/default/Float32/2048 0.761 ± 0.061 μs 0.76 ± 0.058 μs 1
saxpy/default/Float32/256 0.569 ± 0.0058 μs 0.572 ± 0.008 μs 0.996
saxpy/default/Float32/262144 0.0568 ± 0.0037 ms 0.0457 ± 0.005 ms 1.24
saxpy/default/Float32/32768 5.32 ± 0.33 μs 5.39 ± 0.65 μs 0.987
saxpy/default/Float32/4096 1.14 ± 0.089 μs 1.14 ± 0.084 μs 1
saxpy/default/Float32/512 0.604 ± 0.0074 μs 0.602 ± 0.0073 μs 1
saxpy/default/Float32/64 0.559 ± 0.005 μs 0.562 ± 0.0058 μs 0.995
saxpy/default/Float32/65536 13 ± 1.3 μs 12.2 ± 1.4 μs 1.07
saxpy/default/Float64/1024 0.759 ± 0.063 μs 0.768 ± 0.044 μs 0.989
saxpy/default/Float64/1048576 0.484 ± 0.045 ms 0.522 ± 0.051 ms 0.927
saxpy/default/Float64/16384 5.27 ± 0.28 μs 5.31 ± 0.54 μs 0.993
saxpy/default/Float64/2048 1.14 ± 0.084 μs 1.14 ± 0.086 μs 0.997
saxpy/default/Float64/256 0.588 ± 0.008 μs 0.588 ± 0.0071 μs 0.999
saxpy/default/Float64/262144 0.0958 ± 0.015 ms 0.0986 ± 0.015 ms 0.972
saxpy/default/Float64/32768 11.9 ± 0.96 μs 12.3 ± 1.5 μs 0.963
saxpy/default/Float64/4096 1.7 ± 0.2 μs 1.7 ± 0.12 μs 1
saxpy/default/Float64/512 0.642 ± 0.012 μs 0.636 ± 0.01 μs 1.01
saxpy/default/Float64/64 0.561 ± 0.0068 μs 0.565 ± 0.0058 μs 0.992
saxpy/default/Float64/65536 27.1 ± 4.8 μs 24.7 ± 3.2 μs 1.1
saxpy/static workgroup=(1024,)/Float16/1024 2.21 ± 0.028 μs 2.22 ± 0.033 μs 0.994
saxpy/static workgroup=(1024,)/Float16/1048576 0.158 ± 0.0078 ms 0.165 ± 0.012 ms 0.957
saxpy/static workgroup=(1024,)/Float16/16384 4.41 ± 0.086 μs 4.47 ± 0.13 μs 0.987
saxpy/static workgroup=(1024,)/Float16/2048 2.37 ± 0.029 μs 2.39 ± 0.031 μs 0.992
saxpy/static workgroup=(1024,)/Float16/256 2.83 ± 0.036 μs 2.87 ± 0.048 μs 0.988
saxpy/static workgroup=(1024,)/Float16/262144 0.042 ± 0.0012 ms 0.0427 ± 0.0019 ms 0.985
saxpy/static workgroup=(1024,)/Float16/32768 6.84 ± 0.18 μs 6.93 ± 0.25 μs 0.988
saxpy/static workgroup=(1024,)/Float16/4096 2.67 ± 0.037 μs 2.71 ± 0.041 μs 0.985
saxpy/static workgroup=(1024,)/Float16/512 3.28 ± 0.04 μs 3.33 ± 0.24 μs 0.986
saxpy/static workgroup=(1024,)/Float16/64 2.53 ± 0.22 μs 2.56 ± 0.22 μs 0.99
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.29 μs 12.6 ± 0.52 μs 0.987
saxpy/static workgroup=(1024,)/Float32/1024 2.21 ± 0.033 μs 2.19 ± 0.033 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1048576 0.244 ± 0.021 ms 0.255 ± 0.021 ms 0.957
saxpy/static workgroup=(1024,)/Float32/16384 4.33 ± 0.23 μs 4.35 ± 0.24 μs 0.995
saxpy/static workgroup=(1024,)/Float32/2048 2.36 ± 0.052 μs 2.36 ± 0.059 μs 0.999
saxpy/static workgroup=(1024,)/Float32/256 2.66 ± 0.04 μs 2.65 ± 0.058 μs 1
saxpy/static workgroup=(1024,)/Float32/262144 0.0605 ± 0.004 ms 0.0626 ± 0.0047 ms 0.966
saxpy/static workgroup=(1024,)/Float32/32768 7.4 ± 0.41 μs 7.48 ± 0.53 μs 0.99
saxpy/static workgroup=(1024,)/Float32/4096 2.65 ± 0.078 μs 2.66 ± 0.089 μs 0.998
saxpy/static workgroup=(1024,)/Float32/512 2.71 ± 0.087 μs 2.7 ± 0.079 μs 1
saxpy/static workgroup=(1024,)/Float32/64 2.69 ± 5.3 μs 2.68 ± 4.9 μs 1
saxpy/static workgroup=(1024,)/Float32/65536 15.2 ± 0.94 μs 16 ± 1.5 μs 0.946
saxpy/static workgroup=(1024,)/Float64/1024 2.31 ± 0.056 μs 2.34 ± 0.052 μs 0.984
saxpy/static workgroup=(1024,)/Float64/1048576 0.536 ± 0.052 ms 0.577 ± 0.058 ms 0.929
saxpy/static workgroup=(1024,)/Float64/16384 7.28 ± 0.4 μs 7.45 ± 0.62 μs 0.977
saxpy/static workgroup=(1024,)/Float64/2048 2.6 ± 0.082 μs 2.63 ± 0.087 μs 0.991
saxpy/static workgroup=(1024,)/Float64/256 2.65 ± 0.076 μs 2.69 ± 0.073 μs 0.983
saxpy/static workgroup=(1024,)/Float64/262144 0.117 ± 0.011 ms 0.124 ± 0.011 ms 0.944
saxpy/static workgroup=(1024,)/Float64/32768 15.4 ± 1.3 μs 15.9 ± 1.4 μs 0.97
saxpy/static workgroup=(1024,)/Float64/4096 3.14 ± 0.19 μs 3.21 ± 0.24 μs 0.978
saxpy/static workgroup=(1024,)/Float64/512 2.63 ± 0.054 μs 2.67 ± 0.062 μs 0.985
saxpy/static workgroup=(1024,)/Float64/64 2.58 ± 0.069 μs 2.65 ± 18 μs 0.971
saxpy/static workgroup=(1024,)/Float64/65536 31.2 ± 1.9 μs 0.033 ± 0.003 ms 0.947
time_to_load 0.322 ± 0.003 s 1.16 ± 0.0044 s 0.277

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 0% with 855 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (8a87f77) to head (2121d5c).

Files with missing lines Patch % Lines
src/pocl/nanoOpenCL.jl 0.00% 497 Missing ⚠️
src/pocl/device/array.jl 0.00% 101 Missing ⚠️
src/pocl/backend.jl 0.00% 92 Missing ⚠️
src/pocl/compiler/execution.jl 0.00% 43 Missing ⚠️
src/pocl/compiler/compilation.jl 0.00% 32 Missing ⚠️
src/pocl/device/quirks.jl 0.00% 31 Missing ⚠️
src/pocl/compiler/reflection.jl 0.00% 23 Missing ⚠️
src/pocl/pocl.jl 0.00% 20 Missing ⚠️
src/macros.jl 0.00% 8 Missing ⚠️
src/pocl/device/runtime.jl 0.00% 6 Missing ⚠️
... and 1 more
Additional details and impacted files
@@          Coverage Diff           @@
##            main    #556    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         12      21     +9     
  Lines        777    1575   +798     
======================================
- Misses       777    1575   +798     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vchuravy vchuravy changed the base branch from main to vc/barriers February 4, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant