-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant linalg.generic copies + temporary buffers #798
Comments
Regarding Inference.2: What I've been able to find is that since we remove the Matmul argument due to canonicalizatio, the bufferization pass ends up creating the corresponding copy op before the Epilogue computation starts. This is not the case when canonicalization is switched off and the copy op is created after the computation which then gets canonicalized away and we get correct IR. |
Thanks @Abhishek-Varma! The CSE issue is introduced after A summary of what is observed by running CSE.
After CSE, it removes
So, after bufferization it creates new alloc and copies for
|
A summary of what is observed by running canonicalization pattern (ForallOpIterArgsFolder). Before canonicalization
After Canonicalization, it removes a
After bufferization it creates new alloc and copies for output
|
Let me leave the CSE issue aside for now, the canonicalization looks OK to me and I dont see the link from that to the bufferization allocation. Taking this
if
it is bound to create a buffer. Something needs to hold the value for |
As discussed offline I've tabled off the Now, with the new vectorization branch (along with the consumer fusion fix upstream) for Matmul + Relu, here's what I found :-
NOTE: The above branch linked switches off CANONICALIZE + CSE - but we still see temporary buffer issue for all cases above. Regarding the temporary buffer issue - I tried inspecting the IR and here's what I've inferred :-
Since we have 3 sections whose boundaries are ONLY dealing with TENSORS, we have this extra buffer issue. Therefore essentially we have the need of a temporary buffer for writing the output of each sections into.
(NOTE: Each I speculate that if we propagate So, we should therefore have the following :-
We clearly have quite a few set of issues. But to possibly make Relu work as a stop-gap solution by switching off CANONICALIZE + CSE (like in the new vectorization branch I linked above), I propose doing the following :-
@MaheshRavishankar please let me know your thoughts. |
GOAL: Triage and solve the issue of redundant linalg.generic copies AND temporary buffer issue.
EXPERIMENT :
The following experiment was done on
main
branch and for PURE MATMUL for which you may find the input IR here :-OBSERVATION :
linalg.generic
copies for LHS and RHS (experiment number 2).linalg.generic
copy for OUTPUT of Matmul - and this in turn gets us the temporary buffer issue (experiment number 3).linalg.generic
copies for LHS, RHS and OUTPUT of Matmul and temporary buffer issue (experiment number 4).INFERENCE (so far) : We have two orthogonal issues =>
cse
.NOTE : We now need to take a look at how bufferization (or some other pass is working in tandem with
CSE
/ForallOpIterArgsFolder
separately - might be that the issue is actually in bufferization OR vice-versa). Therefore needs further triaging (and an update in this same issue thread).GLOSSARY :
linalg.generic
copiesThe text was updated successfully, but these errors were encountered: