You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This use case is a vectorized conv2d kernel. It's lowered from MLIR without using any target-specific dialect (like AIEVec), and it presents a couple of interesting challenges.
The first thing that trips the compiler is %143:_(<32 x s32>) = G_ADD %142:_, %112:_, which looks like a 1024b vector addition. Those are supported (on accumulator registers), so it should be possible to select for them.
In any case, more interesting than that are a few shufflevector ops, some early on, that could cause problems. They are splats and broadcasts that should be possible to lower by using broadcast and insert ops.
And the key is going to be this:
%70 = zext <32 x i8> %56to <32 x i32>
%71 = sext <32 x i8> %69to <32 x i32>
%72 = mul <32 x i32> %70, %71%73 = add <32 x i32> %72, %48
Those four instructions are effectively a VMAC op. The s/zext ops have to be selected together with at least the mul <32 x i32> (into a VMUL), otherwise mul is invalid, and ideally together with the add op into a VMAC, to maximize performance.
There are a few weird things going on with insertions and extractions of <8 x i32> (which should be fine) and <8 x i8> (which might be problematic). There are lowerings we can implement that would get rid of those and replace them with insert/extract on <32 x i32>/<32 x i8> (1024b/256b), but I thought it might be an interesting issue to have as a test, since this is the natural way in which MLIR lowers multi-rank vectors to LLVM IR.
The text was updated successfully, but these errors were encountered:
This use case is a vectorized conv2d kernel. It's lowered from MLIR without using any target-specific dialect (like AIEVec), and it presents a couple of interesting challenges.
The first thing that trips the compiler is
%143:_(<32 x s32>) = G_ADD %142:_, %112:_
, which looks like a 1024b vector addition. Those are supported (on accumulator registers), so it should be possible to select for them.In any case, more interesting than that are a few
shufflevector
ops, some early on, that could cause problems. They are splats and broadcasts that should be possible to lower by using broadcast and insert ops.And the key is going to be this:
Those four instructions are effectively a
VMAC
op. Thes/zext
ops have to be selected together with at least themul <32 x i32>
(into aVMUL
), otherwisemul
is invalid, and ideally together with theadd
op into aVMAC
, to maximize performance.There are a few weird things going on with insertions and extractions of
<8 x i32>
(which should be fine) and<8 x i8>
(which might be problematic). There are lowerings we can implement that would get rid of those and replace them with insert/extract on<32 x i32>
/<32 x i8>
(1024b/256b), but I thought it might be an interesting issue to have as a test, since this is the natural way in whichMLIR
lowers multi-rank vectors toLLVM IR
.The text was updated successfully, but these errors were encountered: