-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized matmul performance regression - function inlining #883
Comments
Can someone please remind me -- at what granularity is the matmul outlined? Is it at the m=n=k=64 granularity or the m=n=4 k=8 granularity (assuming phoenix bf16) ? |
It takes place at the latter granularity. Here's the outlined matmul :-
Here's an e2e log (created earlier) for reference. |
Not sure, but perhaps this might be the reason behind (and hopefully a fix for) this regression. |
Maybe, but it isn't surprising to me that outlining a single AIE instruction (matmul on 4x8x4) can result in a slow down |
Yeah, I guess outlining functions would definitely add some regression because of function invocation overhead. As it was initially attempted to reduce the program memory requirement, it can definitely introduce performance overhead - perhaps the way forward should be "conditional" enabling of function outlining for now while the peano loop unrolling control is enabled ? |
We're seeing performance regression on vectorized matmul, likely caused by the following PR: #856, see table below:
Matmul problem size: 512x512x4096 (MxKxN)
Array configuration: 2x2
Vectorization or ukernel or scalar: Vectorization
@Abhishek-Varma
Note that there is another PR causing performance regression: #882, which is likely orthogonal.
The text was updated successfully, but these errors were encountered: