Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256. #5030

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

tingboliao
Copy link

The implementation of the original zgemm_tcopy_4_rvv, when the vector length (vlen) is 128 and 256,
causes some cases in the cgemmt series to fail when running openblas_utest_ext for functional testing.
The optimized version can pass the functional tests with various vector lengths such as 128, 256, 512, and 1024.

Furthermore, for the relevant cases in the benchmark, the further optimized version has better performance on two pieces of hardware, namely K230 [C908, vlen = 128] and K1 [C908, vlen = 256], compared with the original optimized version.
The performance data is shown as below:

Parameter setting: OPENBLAS_LOOPS = 10000.
1. K230 [C908, vlen = 128]:
Cases Original RVV / MFlops Optimized RVV / MFlops
cher2k.goto 4619.25 4753.04
cherk.goto 4117.78 4182.16
csyr2k.goto 4581.21 4701.76
csyrk.goto 4033.85 4126.95

2. K1 [C908, vlen = 256]:
Cases Original RVV / MFlops Optimized RVV / MFlops
cher2k.goto 6697.40 7298.92
cherk.goto 5701.16 6224.16
csyr2k.goto 6558.31 7195.55
csyrk.goto 5599.63 6136.10

In the above data, the bigger value is, the better performance is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant