-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using microkernels for single core code. #153
Comments
IREE CPU already uses microkernels, use that as the baseline to get this flow working. Here is a gist that explains how this is plumbed through on the CPU side https://gist.github.com/bjacob/2c160c102cee33562826c730945b49f2 |
I've raised a PR to tackle the first step : |
So, I experimented with
I confirm that we are getting It was failing in the I then worked on triaging the same and have added a fix in the With the fix added we are able to get the IR all the way until that stage where basically all of our e2e lit tests get to. I've added the fix as well as |
Short summary:
(As expected each core will have a reference of the microkernel they individually are running). We are successfully able to reach this stage starting from the following input IR :-
Long summary:
|
Updates:
Following is the command which MLIR-AIE seems to be using for testing microkernels via RYZEN AI CHESS COMPILER:-
NOTE: The links attached is the same gist link (with updated content) which I've previously used to provide an update to this tracker. |
Updates :
The
Note the changes I made above :- You may take a look at the e2e log for this stage.
You may take a look at the e2e log for this stage.
Note: As for the next step - I'm going to chase preparing PRs to ensure the |
Hi @Abhishek-Varma, my understanding is that
Option 1 is probably the easiest to get things working as air/aie seems to have taken the 'reinterpret data type' approach to ensure multiples of 4 byte words in DMA operations. But not sure what the plans are there longer term? @erwei-xilinx Any thoughts? |
Oh no you went down the same rabbit hole as I did 😄 The pass I added to fix it was Xilinx/mlir-air@e8a32b0 (it should be live in the latest iree-amd-aie now). @MaheshRavishankar also mentioned a pass in IREE for this, could be https://github.com/openxla/iree/blob/09deadfb8a58d17a4cf136ce916a661836eff2cf/compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp#L595 |
My understanding is that option 2 is already done |
https://github.com/openxla/iree/blob/main/compiler%2Fsrc%2Firee%2Fcompiler%2FCodegen%2FCommon%2FCleanupBufferAllocViewPass.cpp is the lass in IREE that removes hal.interface.binding that are dead (i.e. uses are only in alignment operations) I haven't looked at this PR itself to get all the details |
So reinitializing the mlir-air submodules in iree-amd-aie helped. The error reported earlier is not there anymore. @newling @MaheshRavishankar For bf16 I'm now getting :-
I'm not sure why it complains within the same I'm attaching the IR e2e log for reference. |
Updates besides the bf16's case :-
|
Updates :
I'm attaching the IR e2e log - you'd see that now the I'm happy because this further cleanly integrates with MLIR-AIR changes for which I raised draft PR. :) We need to discuss how to glue in RFC - DISCUSS UKERNEL INTEGRATION.So, we are using mm.c to put down pieces for the ukernel based lowering via IREE codegen. The idea is to first do the bare minimum required i.e. a non-performant matmul run via the microkernel. Section A (walk-through):Following is how it has been tested to work (taking example of
Section B (fiddling with mlir-aie):Now, let's look at the changes required to make this happen :-
Section C (point-of-convergence)We need to converge on how we need would want to use microkernels (starting with the non-performant variant in
The following comment I added in my WIP branch would help reinforce why I'm suggesting Section C.2 :-
The above function I've added to the |
Updates :-
|
Updates:
|
My updates :-
|
My updates for the day are as follows :-
|
My updates :-
I worked on narrowing it down by parallely executing
|
Adding an update for the current state of ukernel support of Pad-Pack :-
Input that is leading to the issue :-
Input that generates correct IR :-
What is happening here is that while figuring out Pasting below the invalid Current case (I intentionally adding the beautiful/assembly format of the invalid IR) :-
Expected case :-
I'll dig further on this and try adding a fix for the same as well. |
@Abhishek-Varma Thanks for looking into this. |
No description provided.
The text was updated successfully, but these errors were encountered: