Brgemm register tiling for bf16 type #1005

arun-thmn · 2025-02-03T03:14:59Z

This PR extends the brgemm register tiling pass to support bf16 type. The changes:

Template the existing pass to execute on linalg.batch_reduce_matmul for fp32 and linal.generic for vnni opt bf16,
Test-cases for bf16 type.

arun-thmn · 2025-02-03T03:44:06Z

@rengolin Request to review this PR for bf16 register tile support. I have re-written the tiling pass with new logic (template and more checks) to tile both fp32 and f16 (vnni). If you have time, I request you to review it as a new pass (as the existing tiling for fp32, I did it immediately joining Intel with lesser understanding of concepts).

adam-smnk · 2025-02-06T11:34:56Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

+//===- BrgemmLinalgTiling.cpp -----------------------------------------*-
+//C++-*-===//


nit: tweak to fit in single line

adam-smnk · 2025-02-06T11:36:55Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

+    // Check whether the tile sizes are valid
+    if (options.registerTileShape.size() != 3 &&
+        options.registerTileShape.size() != 2)
+      return failure();


nit: it's easier to debug in the future with rewriter.notifyMatchFailure that provides some feedback instead of failure

adam-smnk · 2025-02-06T11:42:33Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

+    // Set the K tile to 1, if the user not provided (it is fp32 target)
+    if (options.registerTileShape.size() == 2)
+      mxnxkTile[2] = 1;


I'd just force user to always provide m,n,k tiles. It'll simplify verification logic and makes usage more explicit.

adam-smnk · 2025-02-06T11:46:24Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

+    if (options.registerTileShape.size() == 2)
+      mxnxkTile[2] = 1;
+
+    // k-tile size adjusted based on the vnni layout for bf16 type


This has baked-in assumptions that are not verified.
As the pass now operates on generic, we need to strictly filter ops that are accepted. I think you need to at least ensure it is a VNNI contraction first - there should be some suitable helpers in VnniUtils.
If f32 generic should be supported as well, it might need some extra checks there too.

adam-smnk · 2025-02-06T11:49:33Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

+    for (auto itrShapeMNK = mxnxkTile.begin(); itrShapeMNK != mxnxkTile.end();
+         itrShapeMNK++, i++) {


nit: you could use llvm::enumerate for this, like

for (auto [idx, itrShape] : llvm::enumerate(mxnxkTile)) {

adam-smnk · 2025-02-06T11:54:15Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

      }
    }

+    // DS to assist while creating new subviews with correct indices and shapes
+    SmallVector<int64_t> mxkTile(2);


nit: you could directly brace initialize it as mxkTile = {val1, val2};

adam-smnk · 2025-02-06T11:57:04Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

        }
      }

      auto subview = rewriter.create<memref::SubViewOp>(
-          brgemmOp.getLoc(), MemRefType(),
-          input, offsets, shape, strides);
+          brgemmOp.getLoc(), MemRefType(), input, offsets, shape, strides);


nit: you can skip the result MemRefType and use this builder:
static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, Value source, ArrayRef<OpFoldResult> offsets, ArrayRef<OpFoldResult> sizes, ArrayRef<OpFoldResult> strides, ArrayRef<NamedAttribute> attrs = {});

adam-smnk · 2025-02-06T11:58:53Z

test/Integration/tile-brgemm-linalg-matmul-bf16.mlir

@@ -0,0 +1,30 @@
+// RUN: tpp-run -e register_tile_bf16 --entry-point-result=void -print %s > %t.1
+// RUN: tpp-opt %s --tile-brgemm-linalg="registerBlocking=32,32,32" -convert-linalg-to-xsmm | tpp-run -e  register_tile_bf16 --entry-point-result=void -print > %t.2
+// RUN: diff %t.1 %t.2 


It might be more robust to use fpcmp

adam-smnk · 2025-02-06T11:59:24Z

test/Integration/tile-brgemm-linalg-matmul-bf16.mlir

+// RUN: tpp-run -e register_tile_bf16 --entry-point-result=void -print %s > %t.1
+// RUN: tpp-opt %s --tile-brgemm-linalg="registerBlocking=32,32,32" -convert-linalg-to-xsmm | tpp-run -e  register_tile_bf16 --entry-point-result=void -print > %t.2
+// RUN: diff %t.1 %t.2 
+// RUN: rm %t.1 %t.2


nit: it already creates temporary file, there should be no need to explicitly delete them

adam-smnk · 2025-02-06T12:01:38Z

test/Passes/pass-tile-brgemm-linalg-matmul.mlir

+// CONF1-LABEL:   memref.global "private" constant @__constant_48x32x32xf32 : memref<48x32x32xf32> = dense<1.000000e+00> {alignment = 64 : i64}
+// CONF1-LABEL:   func.func @chainned_gemm_do_register_tiling(
+// CONF1-SAME:                     %[[VAL_0:.*]]: memref<8x48x32x32xf32>) -> memref<8x48x32x32xf32> {
+// CONF1:           %[[VAL_1:.*]] = arith.constant 1 : index


Could you use more descriptive named for the captured values?

Also, these check feel too explicit, maybe you could omit some details

rolfmorel · 2025-02-06T20:51:02Z

lib/TPP/Transforms/BrgemmLinalgTiling.cpp

      // Creates M, N, and K tile loops
-      scf::ForOp loopOp = rewriter.create<scf::ForOp>(brgemmOp.getLoc(),
-                                                      zeroCst, ubCstTiledLoop, stepCstTiledLoop);
+      scf::ForOp loopOp = rewriter.create<scf::ForOp>(


If I am understanding right, this transform is meant to operate on linalg ops. As I expect all the ops you want to support will implement TilingInterface, would it be possible to just use the TileUsingFor transform instead of manually implementing tiling?

rolfmorel · 2025-02-06T20:53:59Z

test/Passes/pass-tile-brgemm-linalg-matmul.mlir

+    %0 = memref.get_global @__constant_32x16x32x2xbf16 : memref<32x16x32x2xbf16>
+    %alloc = memref.alloc() {alignment = 64 : i64} : memref<8x32x32x32xbf16>
+    %expand_shape = memref.expand_shape %arg0 [[0], [1], [2], [3, 4]] output_shape [8, 32, 32, 16, 2] : memref<8x32x32x32xbf16> into memref<8x32x32x16x2xbf16>
+    scf.forall (%arg1, %arg2) in (8, 32) {


If these scf.forall are not needed by the (matcher of the) transform can we please get rid of them? Same goes for all the unittests in this file and other surrounding IR that does not influence the code-under-test.

Arun Thangamani added 2 commits January 31, 2025 19:00

code re-factoring

7943e1e

code re-factoring and test cases for bf16

310a21a

arun-thmn added the benchmark-all Benchmark all targets label Feb 3, 2025

code re-factoring - adding comments, space etc.,.

7653434

arun-thmn marked this pull request as ready for review February 3, 2025 03:38

Added a integration test-case for bf16 type

09d9ae7

arun-thmn added benchmark-all Benchmark all targets and removed benchmark-all Benchmark all targets labels Feb 3, 2025

rengolin requested review from rolfmorel and adam-smnk February 5, 2025 00:16

adam-smnk reviewed Feb 6, 2025

View reviewed changes

rolfmorel reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brgemm register tiling for bf16 type #1005

Brgemm register tiling for bf16 type #1005

arun-thmn commented Feb 3, 2025 •

edited

Loading

arun-thmn commented Feb 3, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

adam-smnk Feb 6, 2025

rolfmorel Feb 6, 2025

rolfmorel Feb 6, 2025

		//===- BrgemmLinalgTiling.cpp -----------------------------------------*-
		//C++-*-===//

		for (auto itrShapeMNK = mxnxkTile.begin(); itrShapeMNK != mxnxkTile.end();
		itrShapeMNK++, i++) {

Brgemm register tiling for bf16 type #1005

Are you sure you want to change the base?

Brgemm register tiling for bf16 type #1005

Conversation

arun-thmn commented Feb 3, 2025 • edited Loading

arun-thmn commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arun-thmn commented Feb 3, 2025 •

edited

Loading