Allowing "source" (bias tensor) and output tensor to have different data type #352

masahi · 2021-11-01T04:10:40Z

masahi
Nov 1, 2021

Hi, in deep learning models there is often a need to cast fp16 outputs from a fully connected (dense) layer to fp32, if the next op requires high precision. For example, in BERT there are many patterns of

fp16 dense -> activation -> cast to fp32 -> softmax or layer norm

My goal is to fuse cast to fp32 into an epilogue, to remove many "cast only" kernels while keeping the computation of dense fully in fp16 (fp16 accum and output). Such capability would unlock many possibilities for BERT fusion.

I thought I can achieve that by letting ElementOutput be float and everything else be half. Then NumericArrayConverter at https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/epilogue/thread/linear_combination_relu.h#L205 should be able to do the cast from half to `float.

But the above approach doesn't compile for me because in CUTLASS, C (source) tensor is required to have ElementOutput type, https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/gemm/device/gemm.h#L280-L281.
Also this line in the epilogue seems to require source to be of type ElementOutput. https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/epilogue/thread/linear_combination_relu.h#L181

I believe this is a big problem for DL use cases because the bias parameters are most likely of type ElementAccumulator, rather than the type of output. So ideally I want the following to compile:

    TensorRef<ElementAccumulator const, LayoutC> ref_C;
    TensorRef<ElementOutput, LayoutC> ref_D;

For example, when I want fp16 accumulation, the bias is always in fp16. In int8 case, I want to fuse int8 dense (int32 result) -> requantize (int8 output). Here, the output is int8 but the bias addition is done on the int32 output before requantize. So the bias is usually quantized to int32, not int8.

I'm wondering if there is an easier way to achieve my goal than adding new code duplicating everything in the existing code base...

cc @Laurawly

Answered by hwu36

Nov 1, 2021

I am aware of this problem. I think the right solution is to separate the data type of C and D. Maybe we can call the new one ElementBias and set its default to be ElementOutput. If you create a PR, we can work together on it.

cc @jwang323

View full answer

hwu36 · 2021-11-01T17:44:30Z

hwu36
Nov 1, 2021
Maintainer

I am aware of this problem. I think the right solution is to separate the data type of C and D. Maybe we can call the new one ElementBias and set its default to be ElementOutput. If you create a PR, we can work together on it.

cc @jwang323

1 reply

masahi Nov 1, 2021
Author

Glad to hear that you are aware of this problem. Since the change may have implications for the overall design of the rest of cutlass, I'll wait for the "official" solution 😅

masahi · 2023-02-02T00:51:40Z

masahi
Feb 2, 2023
Author

@hwu36 Is there a solution to this problem in cutlass v3?

2 replies

thakkarV Feb 2, 2023
Collaborator

Yes. All tensors in an epilogue have different types and layouts in 3.0. 3.0 also uses the same thread level ops as 2.x and adds a separate ElementSource to support separate element types. Scale types, compute type, and accumulate type can also all be separate.

masahi Feb 2, 2023
Author

Very cool, thanks. I'll take at how all the pieces fit together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing "source" (bias tensor) and output tensor to have different data type #352

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Allowing "source" (bias tensor) and output tensor to have different data type #352

masahi Nov 1, 2021

Replies: 2 comments · 3 replies

hwu36 Nov 1, 2021 Maintainer

masahi Nov 1, 2021 Author

masahi Feb 2, 2023 Author

thakkarV Feb 2, 2023 Collaborator

masahi Feb 2, 2023 Author

masahi
Nov 1, 2021

Replies: 2 comments 3 replies

hwu36
Nov 1, 2021
Maintainer

masahi Nov 1, 2021
Author

masahi
Feb 2, 2023
Author

thakkarV Feb 2, 2023
Collaborator

masahi Feb 2, 2023
Author