Overhaul G_EXTRACT_VECTOR_ELT legalization strategy #328

konstantinschwarz · 2025-02-01T00:25:56Z

We now legalize all G_AIE_[ZS]EXT_EXTRACT_VECTOR_ELT/G_EXTRACT_VECTOR_ELT to have 512-bit source vectors.
This allows us to delete a lot of redundant instruction selection code and makes the code much more re-usable across AIE architectures.

Best to review commit by commit, each one should be self-contained.

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

llvm/test/CodeGen/AIE/GlobalISel/legalize-vsel.mir

llvm/lib/Target/AIE/AIE2PreLegalizerCombiner.cpp

martien-de-jong · 2025-02-03T09:57:00Z

llvm/lib/Target/AIE/AIE2PreLegalizerCombiner.cpp

+  case AIE2::G_AIE_ZEXT_EXTRACT_VECTOR_ELT:
+  case AIE2::G_AIE_SEXT_EXTRACT_VECTOR_ELT: {
+    const LLT SrcVecTy = MRI.getType(MI.getOperand(1).getReg());
+    if (SrcVecTy.getSizeInBits() != 512) {


Would it make sense to have a BasicVectorBitSize in a base class?

Or maybe just a static attribute within AIE2PreLegalizerCombinerImpl

we have the magic number 512 in multiple files...

We recently introduced getExtractSubvecNativeSrcSize in AIEBaseInstrInfo.h.
I guess I could rename that function to getBasicVectorBitSize() and use it in existing places and the change here.

martien-de-jong · 2025-02-03T10:04:20Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

+  assert(MI.getOpcode() == TargetOpcode::G_UNMERGE_VALUES);
+
+  const int StartIdx = Regs.size();
+  const int NumResults = MI.getNumOperands() - 1;


Isn't there an MI.defs() that we could iterate over?

martien-de-jong · 2025-02-03T10:04:24Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

+  const int StartIdx = Regs.size();
+  const int NumResults = MI.getNumOperands() - 1;
+  Regs.resize(Regs.size() + NumResults);
+  for (int I = 0; I != NumResults; ++I)


why not < NumResults?

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

llvm/lib/Target/AIE/AIE2LegalizerInfo.cpp

martien-de-jong

Oh, I have far too many comments for such an heroic effort. In general it looks good to me. I would mainly like to avoid uninitialized locals, and have const on declarations that have longer scope than just a few lines.

llvm/test/CodeGen/AIE/GlobalISel/legalize-extract-vector-elt.mir

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/prelegalizercombiner-extract-vector-elt.mir

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-concat-vectors.mir

andcarminati · 2025-02-03T14:56:28Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

@@ -995,6 +995,10 @@ bool llvm::matchExtractVecEltAndExt(
  const LLT S8 = LLT::scalar(8);
  const LLT S16 = LLT::scalar(16);
  LLT SrcVecTy = MRI.getType(MI.getOperand(1).getReg());
+  // Extracts from vectors <= 64-bits are lowered to bit-arithmetic in
+  // legalization
+  if (SrcVecTy.getSizeInBits() <= 64)


Can we extend the test to cover size=64?

I added the tests for 64-bit vectors to llvm/test/CodeGen/AIE/GlobalISel/prelegalizercombiner-extract-vector-elt.mir in the first commit, they now show the new behavior in this commit.

Note however that we do not support 64-bit src operands in the legalizer yet. I don't want to add even more changes to this PR, handling all missing vector sizes should come in a separate PR.

F-Stuckmann · 2025-02-03T15:36:54Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp


  const Register NewDstReg = MRI.createGenericVirtualRegister(NewVecTy);
  MIRBuilder.buildInstr(MI.getOpcode(), {NewDstReg},
                        {SrcReg0, NewSrcReg1, NewSrcReg2}, MI.getFlags());

+  const unsigned NumPadElts = (512 / DstVecSize) - 1;


Maybe add a comment describing the magic number 512

Nah, give it a sensible name and get it from TII.

I'll just use the already existing MaxBitSize, missed it before

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

konstantinschwarz

Thanks a lot for the reviews!
I think I addressed most/all of the comments. In particular, got rid of the hard-coded legal vector size

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-concat-vectors.mir

konstantinschwarz · 2025-02-03T17:39:35Z

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-concat-vectors.mir

+    %1:vregbank(<4 x s64>) = G_IMPLICIT_DEF
+    %2:vregbank(<8 x s64>) = G_CONCAT_VECTORS %0(<4 x s64>), %1(<4 x s64>)
+    PseudoRET implicit $lr, implicit %2
+...


No, we have very little G_CONCAT_VECTOR legalization test coverage. There's also a nice opportunity to move instruction selection code to the legalizer and make it common across architectures.
I'll add the missing legalizer test to this commit, but the re-work of G_CONCAT_VECTOR legalization strategy should come in a separate PR

konstantinschwarz · 2025-02-03T18:16:22Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

@@ -995,6 +995,10 @@ bool llvm::matchExtractVecEltAndExt(
  const LLT S8 = LLT::scalar(8);
  const LLT S16 = LLT::scalar(16);
  LLT SrcVecTy = MRI.getType(MI.getOperand(1).getReg());
+  // Extracts from vectors <= 64-bits are lowered to bit-arithmetic in
+  // legalization
+  if (SrcVecTy.getSizeInBits() <= 64)


I added the tests for 64-bit vectors to llvm/test/CodeGen/AIE/GlobalISel/prelegalizercombiner-extract-vector-elt.mir in the first commit, they now show the new behavior in this commit.

Note however that we do not support 64-bit src operands in the legalizer yet. I don't want to add even more changes to this PR, handling all missing vector sizes should come in a separate PR.

konstantinschwarz · 2025-02-03T18:26:24Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

-    auto Extr = B.buildInstr(ExtractOpc, {LLT::scalar(32)}, {SrcVecReg, Cst});
+    const LLT DstElemTy = MRI.getType(SrcVecReg).getElementType();
+    auto Extr =
+        B.buildExtractVectorElementConstant(DstElemTy, SrcVecReg, *UniqOpIdx);


llvm/lib/Target/AIE/AIELegalizerHelper.cpp

llvm/lib/Target/AIE/AIE2LegalizerInfo.cpp

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

konstantinschwarz · 2025-02-03T22:15:28Z

llvm/lib/Target/AIE/AIE2InstrInfo.cpp

+    if (MRI.getType(MI.getOperand(1).getReg()).getSizeInBits() != 512) {
+      bool IsLegalized =
+          MI.getParent()->getParent()->getProperties().hasProperty(
+              MachineFunctionProperties::Property::Legalized);


Good idea, added

llvm/lib/Target/AIE/AIEBaseInstrInfo.h

martien-de-jong

Nothing blocking

…4 x s64>

…64-bit Keeping these as generic G_EXTRACT_VECTOR_ELTs allows us to use the existing bitcast legalization, lowering such extracts into bit-arithmetic.

…eToExtractBroadcast This is in preparation of only building legal sized (512-bit) G_AIE_[SZ]EXT_EXTRACT_VECTOR_ELT.

We start to duplicate similar code across different legalization functions.

…Size

G_AIE_[ZS]EXT_EXTRACT_VECTOR_ELT are typically introduced in the pre-legalizer combiner. Since target specific opcodes are considered legal by the Legalizer pass, we need to pre-legalize these instructions in the combiner.

We already custom legalized every G_EXTRACT_VECTOR_ELT into G_AIE_SEXT_VECTOR_ELT. This change ensures we are only producing legal G_AIE_SEXT_VECTOR_ELT by re-using the previously introduced legalization action.

… in G_FADD/FSUB legalization

This adds MachineVerifier support for G_AIE_[SZ]EXT_EXTRACT_VECTOR_ELT to enforce only 512-bit source vectors are used after legalization.

…ableGen patterns For AIE2, this is NFC. For AIE2P, we now benefit from patterns supporting constant indices.

niwinanto

Really nice work. Just one comment, not really part of this, but found while reviewing this.

niwinanto · 2025-02-05T09:51:10Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

@@ -1863,7 +1863,8 @@ static std::optional<int> getUniqueIndex(ArrayRef<int> Mask) {
 ///         %1:_(<4 x s64>) = COPY $wl1
 ///         %2:_(<8 x s64>) = G_SHUFFLE_VECTOR %X(<4 x s64>), %1(<4 x s64>),
 ///         shufflemask(3, 3, 3, 3, 3, 3, 3, 3)
-/// To :    %2:_(<8 x s64>) = G_AIE_BROADCAST_VECTOR %X(<4 x s64>)
+/// To :    %3:_(s64) = G_EXTRACT_VECTOR_ELT %X, 3


@konstantinschwarz It seems we are not checking the extract element belongs to first source element. Canonicalization combine from the upstream might solve this, but till then this combine produces wrong results. For example, if shuffle mask in the doc string is (4, 4, ...)

konstantinschwarz requested review from abhinay-anubola, abnikant, andcarminati, F-Stuckmann, gbossu, katerynamuts, khallouh, martien-de-jong, niwinanto, SagarMaheshwari99 and stephenneuendorffer as code owners February 1, 2025 00:25

konstantinschwarz force-pushed the kschwarz.rework.extractelem.handling branch 2 times, most recently from cb955c7 to 81a7627 Compare February 3, 2025 03:50