Skip to content

2 min reduction over both rows and columns. #881

Answered by S-o-T
S-o-T asked this question in Q&A
Discussion options

You must be logged in to vote

So i figured out how to modify an epilogue_with_reduction.h (to use element indices) and gemm_with_fused_epilogue.h (to skip gemm result transfer to gmem) in order to achieve such reduction (but only in single direction, reducing over columns still requires computing transposed problem), but examples of their usage (using fp16) does require cc >= 7.5, while i am still interested in fp32 at cc 6.1, so any advises on direction to look at?

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@S-o-T
Comment options

@S-o-T
Comment options

Answer selected by S-o-T
@hwu36
Comment options

@S-o-T
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants