Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic support for Arm(R) Neoverse(TM) V2 processor #4444

Merged
merged 1 commit into from
Jan 20, 2024

Conversation

Mousius
Copy link
Contributor

@Mousius Mousius commented Jan 19, 2024

Whilst I figure out how best to map the L2 parameters without duplicating all of ARMV8SVE, lets just map this to NEOVERSEV1.

Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
@martin-frbg
Copy link
Collaborator

my thought as well, thanks

@giordano
Copy link
Contributor

$ OPENBLAS_NUM_THREADS=72 OPENBLAS_VERBOSE=2 ./julia -q
Core: armv8sve
julia> peakflops(20_000)
2.0024654326873347e12

julia> using LinearAlgebra

julia> BLAS.lbt_forward("/home/cceamgi/repo/OpenBLAS/libopenblas64_.so"; clear=true)
Core: neoversev1
4860

julia> peakflops(20_000)
2.2361743603730957e12

First peakflops is with OpenBLAS v0.3.26, automatically choosing armv8sve kernels, the second one is after switching to a native build out of this branch (make DYNAMIC_ARCH=1 LIBPREFIX=libopenblas64_ INTERFACE64=1 SYMBOLSUFFIX=64_ -j), which automatically chooses the neoversev1 kernels. The result is consistent with #4440 (comment) where I manually chose the kernels with OPENBLAS_CORETYPE=NEOVERSEV1.

@martin-frbg martin-frbg marked this pull request as ready for review January 20, 2024 14:54
@martin-frbg martin-frbg added this to the 0.3.27 milestone Jan 20, 2024
@martin-frbg martin-frbg merged commit f5de4fa into OpenMathLib:develop Jan 20, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants