Skip to content

HFMA2 and LDS in cutlass efficient fp16 tensorcore kernel? #716

Answered by hwu36
LeiWang1999 asked this question in Q&A
Discussion options

You must be logged in to vote

The epilogue needs to use lds/sts and hfma2 to do alpha and beta scaling. if you want to use nsight to check bank conflicts, you need to change your problem size to launch just one threadblock.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@Peter9606
Comment options

@hwu36
Comment options

Answer selected by LeiWang1999
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants