You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, thanks for the code of GEMV, I see the gemv_fp16 did FMA with fp32 by cast fp16 to fp32, I think maybe its better here to use half2 intrinsic, did you have a try? Thanks~
The text was updated successfully, but these errors were encountered:
I tried different types of half / half2 Intrinsics, on CUDA > 11.0 and spercified compute compatibility, there's no significant difference. cause rhe GEMV kernel is memory bandwidth bound.
hello, thanks for the code of GEMV, I see the gemv_fp16 did FMA with fp32 by cast fp16 to fp32, I think maybe its better here to use half2 intrinsic, did you have a try? Thanks~
The text was updated successfully, but these errors were encountered: