-
Hi, in deep learning models there is often a need to cast fp16 outputs from a fully connected (dense) layer to fp32, if the next op requires high precision. For example, in BERT there are many patterns of
My goal is to fuse I thought I can achieve that by letting But the above approach doesn't compile for me because in CUTLASS, I believe this is a big problem for DL use cases because the bias parameters are most likely of type
For example, when I want fp16 accumulation, the bias is always in fp16. In int8 case, I want to fuse I'm wondering if there is an easier way to achieve my goal than adding new code duplicating everything in the existing code base... cc @Laurawly |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
I am aware of this problem. I think the right solution is to separate the data type of cc @jwang323 |
Beta Was this translation helpful? Give feedback.
-
@hwu36 Is there a solution to this problem in cutlass v3? |
Beta Was this translation helpful? Give feedback.
I am aware of this problem. I think the right solution is to separate the data type of
C
andD
. Maybe we can call the new oneElementBias
and set its default to beElementOutput
. If you create a PR, we can work together on it.cc @jwang323