Replies: 3 comments 1 reply
-
Why cannot you flatten it to be not nested in the 2nd case? |
Beta Was this translation helpful? Give feedback.
-
After more thinking, maybe it can. The first case is easily flattened, since batched LoRA and MoE are equivalent from a grouped GEMM perspective. In the second case, the outer and the inner grouped gemms are very different in nature, and it took me a while to realize that, at least conceptually, they can be framed as another instance of nested grouped gemm. For example, the input I'll see if I can get it actually working. If the conclusion turns out to be that any nested grouped gemm problem can be flattened, that'd be a very interesting learning for me. |
Beta Was this translation helpful? Give feedback.
-
just remember that the ultimate goal of group gemm is to saturate the gpu. if you put too many gemms in a group gemm, it may not help. in this case, you may be able to run the inner group gemm first, then the outer one. |
Beta Was this translation helpful? Give feedback.
-
Hi, recently I encountered a need for "nested" grouped GEMM twice. This is where each grouped-gemm subproblem is another grouped gemm instance.
A good example is composing grouped GEMM for batched LoRA over MoE grouped GEMM. The outer group corresponds to different LoRAs, and each LoRA weight has an inner group of experts. For this example specifically, I found a way to encode the nested grouped gemm problem into a normal, but bigger grouped gemm problem.
But today I hit another case that I cannot get around using the existing API. So it made me wonder - has anyone encountered a similar situation? Is it feasible to implement such a "nested" grouped gemm kernel?
cc @hwu36
Beta Was this translation helpful? Give feedback.
All reactions