Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pass for linalg matmultransposeb op #377

Merged
merged 10 commits into from
Oct 22, 2024
Merged

Conversation

zhxzh-2001
Copy link
Contributor

add Linalg MatMultransposebpass do the vectorization to speed up llama

Copy link
Member

@zhanghb97 zhanghb97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving quick feedback. Additionally, please add test and example for your vectorization pass.

#include <mlir/Pass/Pass.h>

#include "Utils/Utils.h"
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is iostream library necessary here?

//===----------------------------------------------------------------------===//

namespace {
class MatMul_TransposeB_VecPattern : public ConversionPattern{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatMul_TransposeB_VecPattern -> MatMulTransposeBVecPattern

SmallVector<Value,8> lowerBounds(2,c0);
SmallVector<Value,8> uperBounds {aRow,bRow/*bCol*/};
SmallVector<int64_t,8> steps{1,1};
//TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide more detailed information about the TODO.

affine::buildAffineLoopNest(
rewriter,loc,lowerBounds,uperBounds,steps,
[&](OpBuilder &builder,Location loc,ValueRange ivs){
//Value sum_0 = builder.create<mlir::arith::ConstantOp>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not need this line, please remove it.

builder.create<affine::AffineStoreOp>(loc,sum.getResult(0),C,ValueRange{ivs[0],ivs[1]});
}
);
// clang-format on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not find the matched clang-format off.

return success();
}
private:
int64_t vecsize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vecsize -> vecSize

:public PassWrapper<MatMul_TransposeB_VecPass,OperationPass<ModuleOp>>{
public:
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(MatMul_TransposeB_VecPass)
StringRef getArgument() const final{ return "matul-vectorization2"; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matul-vectorization2 is not a good name for the pass.

PassRegistration<MatMul_TransposeB_VecPass>();
}
} // namespace buddy
} // namespace mlir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an empty line here.

Copy link
Member

@zhanghb97 zhanghb97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the Python/Graph part in a separate PR. This PR only adds matmul transpose vectorization.

@@ -35,3 +35,22 @@ linalg-batchmatmul-f32-run:
-reconcile-unrealized-casts | \
${MLIR_CPU_RUNNER} ${OPT_FLAG} -e main -entry-point-result=void \
-shared-libs=${MLIR_RUNNER_UTILS} -shared-libs=${MLIR_C_RUNNER_UTILS}

linalg-transposematmulb-f32-run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split transpose, matmul, and b. (transposematmulb -> matmul-transpose-b)


linalg-transposematmulb-f32-run:
@${BUDDY_OPT} ./linalg-transposematmulb-f32.mlir\
-transpose_matmul_bvectorization \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-transpose_matmul_bvectorization -> -matmul_transpose_b_vectorization

public:
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(MatMulTransposeBVecPass)
StringRef getArgument() const final{ return "transpose_matmul_bvectorization"; }
StringRef getDescription() const final { return "MatMul Vectorization second version.MatMul receive tensortype oprands."; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MatMul Vectorization second version.MatMul receive tensortype oprands." This is not a good description for this pass.

:public PassWrapper<MatMulTransposeBVecPass,OperationPass<ModuleOp>>{
public:
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(MatMulTransposeBVecPass)
StringRef getArgument() const final{ return "transpose_matmul_bvectorization"; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transpose_matmul_bvectorization -> matmul_transpose_b_vectorization

@@ -0,0 +1,79 @@
// RUN: buddy-opt %s \
// RUN: -matmul_transpose_b_vectorization \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend to change -matmul_transpose_b_vectorization to -matmul-transpose-b-vectorization

@xlinsist
Copy link
Collaborator

I encountered this error after using ninja && ninja check-buddy in the build directory. Could you reproduce it in your machine?

image

call @printMemrefF32(%printed_m5) : (memref<*xf32>) -> ()

return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line at the end of the file.

@@ -80,6 +80,10 @@ void registerLowerSchePass();
void registerFuncBufferizeDynamicOffsetPass();
void registerConvertMemcpyToGPUPass();
void registerLegalizeShmemOutliningPass();
void registerMatMul_TransposeB_VecPass();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between registerMatMul_TransposeB_VecPass and registerMatMulTransposeBVecPass? Does this code have any particular meaning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry,i typed an extra function.there is no pass named "registerMatMul_TransposeB_VecPass"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete it

Value B = op->getOperand(1);
Value C = op->getOperand(2);

// Get shape of input and output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: add period

@@ -1968,6 +1986,7 @@ def gt_op(node: GtOp, symbol_table):

ops_registry = {
"MatmulOp": matmul_op,
"transpose_Matmul_fusedOp": transpose_matmul_fused_op,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend naming "TransposeMatmulFusedOp"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put some changes to the graph in the next commit.So I'll delete it for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way I can verify the changes here, or will this part of the changes be verified in another related PR?

@zhxzh-2001
Copy link
Contributor Author

yes, it could be verified in my next pr.This pass is for an op which will be used after the next modification of graph,and will help with llama.

@xlinsist xlinsist merged commit 5d9d52b into buddy-compiler:main Oct 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants