-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LinalgFunctionOutlining] Create a pass to outline linalg compute ops #862
base: main
Are you sure you want to change the base?
Conversation
} else if (isElementwise(computeOp)) { | ||
computeName = "_elementwise"; | ||
} else { | ||
return failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For matmul transpose (etc): It might be nice if this pass, rather than not failing, failed with a message about needing to be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed failure()
altogether and am now simply skipping those compute ops from the walk
before invoking this function to outline. Let me know if that sounds okay to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok. I think it'd still be good to improve the possible situation in N months time when someone is trying to understand why matmul transpose is running out of PM, whereas matmul is not -- if rather than silently not outlining matmul transpose, you made the pass fail for unconsidered ops, the person might be able to avoid the situation.
I guess it's a trade off between failing too often when you shouldn't and letting issues silently slip through.
|
||
/// Utility to outline the linalg compute op. | ||
static FailureOr<func::FuncOp> outlinedToAFunction( | ||
IRRewriter &rewriter, ModuleOp &moduleOp, linalg::LinalgOp &computeOp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's unusual to pass ModuleOp/LinalgOp by reference, they're just glamorous pointers. Is there a reason not to pass them by value? Maybe IRRewriter too (although I'm not sure about this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the references from these ops, except rewriter
itself because that uses context
and I've seen it being referenced in the codebase. If there's a stronger opinion from anyone to have it removed, please let me know and I shall do that.
std::string outlinedFuncName = | ||
computeOp->getName().stripDialect().str() + computeName + "_outlined"; | ||
if (auto outlinedFuncOp = dyn_cast_if_present<func::FuncOp>( | ||
moduleOp.lookupSymbol(outlinedFuncName))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're assuming that if there are 2+ matmuls in the function, then they are identical. Likewise for elementwise operations -- the assumption here is that the unary operation is the same and the input types are the same. While this is true with the current pipeline, it isn't true in general. I thought one reason for using the function outlining approach as opposed to something like #843 was that it would be robust? So I'm wondering if this can be generalized.
Options 1: what happens in MLIR if 2 functions have the exact same signature and body, is there an MLIR pass to detect this and "CSE" them? If so, then we can call the functions '_outlined_0', '_outlined_1' and then let MLIR merge them where possible.
Option 2 (a bit hacky) give the function names more detail -- include M, N, K in function names for matmuls, include unary operation type in function name.
Note: this is not a request for change, I'm ok with this being left for future work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @newling - thank you so much for the nudge in this case!
I initially thought that adding just a clear demarcation using isMatmul
and isElementwise
would be general enough for a particular dispatch, but missed this point in the generalisation.
I've therefore addressed this comment by indeed going ahead with Option 1
as you suggested - involved playing around with IREE's isStructurallyEquivalentTo
but I couldn't make that to work, therefore went ahead with upstream's isEquivalentTo
.
Please take a look. I've also added corresponding comments in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know about isEquivalentTo
. This seems to do exactly what we want.
I probably would have tried a slightly different design -- I would have created a func.func
for every compute operation, and then compared func.funcs for equivalence (using isReqionEquivalentTo
) before returning from outlinedToAFunc. This would require keeping a DenseSet<FuncOp>
instead of the DenseMap, which would mean we could erase compute ops immediately rather than waiting till the end of the pass (smaller mental stack while reading code!)
This is just a minor design preference though -- I will make a PR with this change after this lands.
@@ -243,6 +243,12 @@ def AMDAIEFlattenLogicalObjectFifo : | |||
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEFlattenLogicalObjectFifoPass()"; | |||
} | |||
|
|||
def AMDAIELinalgFunctionOutlining : | |||
Pass<"iree-amdaie-linalg-function-outlining", "ModuleOp"> { | |||
let summary = "Outlining of linalg compute ops"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some additional information in a description = [{ }] block about which linalg ops specifically are outlined (i.e. linalg.fill is not) and what the motivation for this pass is, and what some assumptions are? Sorry, I know that most of our passes in this file don't contain descriptions, but I think it'd be a good thing if we all started adding high-level information here for people without context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added that. Please check if the wordings are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Is
One assumption this pass currently makes: All elementwise/matmul linalg ops within a
dispatch have same body content.
still true? Otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. Doesnt operation equivalence check the region as well?
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/linalg_function_outlining.mlir
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/linalg_function_outlining.mlir
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my comments are still in the other PR that need to be addressed.
|
||
namespace { | ||
|
||
unsigned uniqueOutlinedMatmul = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design nit: I would make these (private) members of the Pass, and make outlinedToAFunction a member function of the Pass too. Just because I'm a bit unsure about these non-static global variables, not seen this C++ design before.
IRRewriter &rewriter, ModuleOp moduleOp, linalg::LinalgOp computeOp, | ||
std::string outlineFuncName, | ||
DenseMap<Operation *, std::string> &computeOpToOutlinedFuncMap) { | ||
// Check if the compute op is equivalent to a previously outlined compute op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we expect moduleOp's symbol table to not contain outlineFuncName
. I think this is worth asserting, to avoid to unlikely case that there is already a function with the name you've just created. I suppose you could increment the uniqueOutlined index in a while loop until an unused name is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally looks good to me. My only remaining comments are mostly minor design things which I'm happy to bikeshed, so accepting as is. But @Abhishek-Varma please ping me if you'd like me to take another look.
|
||
namespace { | ||
|
||
unsigned uniqueOutlinedMatmul = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global values like this arent a good idea. This can cause race conditions when run in multi-threaded context.
I think LLVM already has a way to deduplicate symbols. I dont know how to call it though.
@@ -243,6 +243,12 @@ def AMDAIEFlattenLogicalObjectFifo : | |||
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEFlattenLogicalObjectFifoPass()"; | |||
} | |||
|
|||
def AMDAIELinalgFunctionOutlining : | |||
Pass<"iree-amdaie-linalg-function-outlining", "ModuleOp"> { | |||
let summary = "Outlining of linalg compute ops"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. Doesnt operation equivalence check the region as well?
if (!OperationEquivalence::isEquivalentTo( | ||
computeOp.getOperation(), op, | ||
OperationEquivalence::ignoreValueEquivalence, /*flags=*/nullptr, | ||
OperationEquivalence::IgnoreLocations)) | ||
continue; | ||
outlineFuncName = funcName; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you directly check if the isEquivalentTo
condition holds, and then update outlineFuncName and break?
DenseMap<Operation *, std::string> computeOpToOutlinedFuncMap; | ||
SmallVector<Operation *> toBeErased; | ||
moduleOp.walk([&](linalg::LinalgOp computeOp) { | ||
// Form outlined FuncName. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add comments of what linalg ops are currently outlined.
let summary = "Outlining of linalg compute ops"; | ||
let description = [{ | ||
Outlines matmul/elementwise linalg compute ops only. This pass essentially minimises | ||
the code footprint overall as the unrolling in the later passes lead to lot repeated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 1) "loop unrolling"; 2) "lead to lots of repeated codes".
%7 = amdaie.core(%tile, in : [], out : []) { | ||
linalg.generic { | ||
indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], | ||
iterator_types = ["parallel"] | ||
} ins(%A : memref<4xf32>) | ||
outs(%C : memref<4xbf16>) { | ||
^bb0(%in: f32, %out: bf16): | ||
%8 = arith.truncf %in : f32 to bf16 | ||
linalg.yield %8 : bf16 | ||
} | ||
amdaie.end | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above three elementwise cores are good enough for test purpose. Why don't you delete this or change this to a matmul.
This PR creates a standalone
iree-amdaie-linalg-function-outlining
pass which will be used to later enable the larger Matmul + Truncf e2e.Signed-off-by: Abhishek Varma [email protected]