You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm a big fan of the new EVT support in CUTLASS and have made great use of it to enable some cool new applications. I'm now trying to enable epilogue block scaling and am not sure how to proceed, so I'd appreciate any tips or pointers.
Given a GEMM with inputs A (M,K) and B (K,N), I'd like to multiply by Sa (M,k) in the epilogue, where k < K. This would be particularly useful for FP8 workloads where scaling improves accuracy considerably. Because outputs are computed in tiles anyway, it seems to me that we should be able to do this sort of tile scaling efficiently as long as k is a multiple of the number of tiles used in the reduction across K.
EVT's broadcasting semantics are close to enabling something like this, and I see there is a MatrixBroadcast operation as well, though it's unused. Is something like this possible to express in CUTLASS?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey Folks,
I'm a big fan of the new EVT support in CUTLASS and have made great use of it to enable some cool new applications. I'm now trying to enable epilogue block scaling and am not sure how to proceed, so I'd appreciate any tips or pointers.
Given a GEMM with inputs A
(M,K)
and B(K,N)
, I'd like to multiply by Sa(M,k)
in the epilogue, wherek < K
. This would be particularly useful for FP8 workloads where scaling improves accuracy considerably. Because outputs are computed in tiles anyway, it seems to me that we should be able to do this sort of tile scaling efficiently as long ask
is a multiple of the number of tiles used in the reduction acrossK
.EVT's broadcasting semantics are close to enabling something like this, and I see there is a
MatrixBroadcast
operation as well, though it's unused. Is something like this possible to express in CUTLASS?Beta Was this translation helpful? Give feedback.
All reactions