half2 Intrinsic #2

Zhiwei35 · 2023-11-13T14:48:59Z

hello, thanks for the code of GEMV, I see the gemv_fp16 did FMA with fp32 by cast fp16 to fp32, I think maybe its better here to use half2 intrinsic, did you have a try? Thanks~

TylunasLi · 2023-12-05T15:56:36Z

I tried different types of half / half2 Intrinsics, on CUDA > 11.0 and spercified compute compatibility, there's no significant difference. cause rhe GEMV kernel is memory bandwidth bound.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

half2 Intrinsic #2

half2 Intrinsic #2

Zhiwei35 commented Nov 13, 2023

TylunasLi commented Dec 5, 2023

half2 Intrinsic #2

half2 Intrinsic #2

Comments

Zhiwei35 commented Nov 13, 2023

TylunasLi commented Dec 5, 2023