-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge risc-v branch to develop #4472
Merge risc-v branch to develop #4472
Conversation
`make NOFORTRAN=1 CC=gcc`
…cv_x280 into HellerZheng-develop
…non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls
…vv-intrinsics update riscv intrinsics for latest spec
add riscv level3 C,Z kernel functions.
Add prefix (_riscv) for all riscv intrinsics Update some intrinsics' parameter, like vfredxxxx, vmerge
RISC-V for new intrinsic API changes
fix wrong vr = VFMVVF_FLOAT(0, vl);
fix wrong vr = VFMVVF_FLOAT(0, vl); in symv_L_rvv.c and symv_U_rvv.c
Add rvv support for zsymv and active rvv support for zhemv
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.
Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
…on accumulation, zscal NaN handling
hey, I know people are getting impatient but my plan was to merge #4457 first (which got opened against the branch) but also make sure that its new tests do not blow up all the other platforms... would you prefer to defer that ? |
Makes perfect sense, completely happy to wait with this PR until everything is in place and update this PR as necessary. I've marked the PR as draft, will unmark once everyone is happy with what is happening (or close it if we decide to go for a different strategy). I've tested this on x86, RISC64_GENERIC, RISCV64_ZVL128B and RISCV64_ZVL256B locally. A big part of my reason for kicking a PR is, given the size of the changeset and the fact that it includes many changes outside riscv64-specific modules I am super keen to kick off discussion early and also take advantage of OpenBLAS CI to exercise all the other configurations to get visibility on any potential problems. |
a3b0ef6
to
452741b
Compare
Must admit that the size and range of this changeset is worrying me a bit as well, it used to look a lot more harmless when it was just a trivial update of kernel/riscv64... maybe you're right and rebasing the new utests after the merge makes more sense... |
An alternative strategy might be to rebase risc-v on current develop, deal with the fallout on the branch, then kick a new PR based on that. It might lead to a smaller changeset, and/or make it feasible to split the merge into several smaller PRs rather than one giant one; but it's no magic bullet, and it's not immediately clear that things would be better that way instead of worse. |
Yes, lets go with this one unless anyone complains in the next few hours. The utest extensions are going to open their own can of worms that is totally unrelated to risc-v, and merging develop into risc-v would appear to create at least as many merge conflicts and chances for fallout than doing it here. Add to that that we do not have any CI job that runs on actual RISC-V hardware yet, so the chances of detecting fallout might be smaller. |
Looking good so far, only two timeouts due to slow CI hardware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I’m happy if you’re happy! |
Hmm... now that I have merged this, the hang in the C910V qemu job appears to be persistent :( @sergei-lewis can you check if you see it on actual hardware too ? |
Apologies, was travelling. Will take a look ASAP. |
I've switched AXPY back to generic kernels to get around the hang, but I suspect this may be just another thing that only happens in qemu. |
PR 4497 fixes the hang and reenables the vectorised AXPY kernels on C910V. Any reason we shouldn't add CI testing for the new ZVL128B / ZVL256B targets, btw? I'll put together a PR if not. |
thanks. no reason to not include them in CI, it isn't that long since the branch got merged, is it ? :) |
No description provided.