-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Reorganize AIELegalizerInfo #159
Conversation
virtual unsigned getGenericPadVectorOpcode() const { return -1; } | ||
/// Return the opcode to be used for extracting a smaller vector by ignoring | ||
/// the high bits | ||
virtual unsigned getGenericUnpadVectorOpcode() const { return -1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I feel we are really overloading InstrInfo with all those getXXXXopcode()
function. At least for generic instrucitons, I'm wondering if we should not create e.g.
class AIEGenericInstructionsInterface {
virtual unsigned getInsertVectorEltOpcode() const {...}
virtual unsigned getExtractVectorEltOpcode() const {...}
...
}
I guess sometimes we will only really need to pass AIEGenericInstructionsInterface
, and not the whole AIEBaseInstrInfo
. This will also make unit testing easier, because we could just define a test interface for what is required to be tested.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see an issue with adding more of these getter functions to InstrInfo, its a single place where we collect all architecture specific opcode information.
Now for AIE specific generic opcodes we could also consider them to be always available on every AIE architecture, and requiring them to be defined as the first opcodes after GENERIC_OP_END
.
Then it doesn't matter which AIE namespace we use to access those opcodes (AIE::
or AIE2::
at this point).
We would have to be extremely careful not to introduce an opcode in between GENERIC_OP_END and the shared opcodes only for a single AIE version. Not sure that's any better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for making it easier, I know that I made that mistake a couple of times and it hard to catch. An alternative design would be to define getOpcode
that takes an opcode in the TargetOpcode
space and turns it into AIE1
, AIE2
or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that these "target specific generic machine instructions" do not exist in the TargetOpcode
namespace, so we cannot translate them into the target namespace.
TargetOpcode::
only contains the instructions up to GENERIC_OP_END
, which also marks the beginning of the target specific (generic) machine instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the idea could have a generic namespace that is an amalgamation of the different architectures. That has different drawbacks, like keeping it up to date and it becoming very large, but it is an alternative. Both are good to me and I happy with his as well
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…eneric MIR instructions Also use more builder functions where available
17e9683
to
23842e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Konstantin,
Thanks for doing this work, it was something that was on my todo list that I didn't get time to do. Since I did spent a decent chunk of time spelunking in the legalizer, I thought it might be useful for me to chime in.
It is a good reorganization, I am happy to see the code being split up and the if statements being removed. There is some ground to be gained in terms of duplication, since this move will cause a decent amount of copying back and forth. As shown in #160, the legalizer isn't necessarily feature complete for each opcode yet, which means that both legalizers would need to be updated to deal with changes like this. So I am unsure if the DRY-complexity tradeoff here is optimal.
As to fix it, there are two ideas that I have toyed with and also put here as review comments:
-
Move the type definitions into a separate header that either imports the right subheader or does a few if guards. Basic types will likely be shared and this has the extra benefit of just being able to import
aieX_types.h
through the GISel stack. Now these definitions are given ad-hoc when they are needed. -
Reuse the same trick you use for the opcodes and use that for the legalizer statements themselves. This is the template pattern, where each opcode is a function call that the subclass can override when it is needed. This gets rid of both the
isAIE
calls, but allows you define "default" rules.
But, besides that, great work!
TL;DR This is a good refactor that solves a decent of the problems, but I think that a bit more can be won by separating the LLT variables and applying the template pattern
virtual unsigned getGenericPadVectorOpcode() const { return -1; } | ||
/// Return the opcode to be used for extracting a smaller vector by ignoring | ||
/// the high bits | ||
virtual unsigned getGenericUnpadVectorOpcode() const { return -1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for making it easier, I know that I made that mistake a couple of times and it hard to catch. An alternative design would be to define getOpcode
that takes an opcode in the TargetOpcode
space and turns it into AIE1
, AIE2
or whatever.
const LLT S8 = LLT::scalar(8); | ||
const LLT S16 = LLT::scalar(16); | ||
const LLT S20 = LLT::scalar(20); | ||
const LLT S32 = LLT::scalar(32); | ||
const LLT S64 = LLT::scalar(64); | ||
const LLT P0 = LLT::pointer(0, 20); | ||
|
||
// 128-bit vectors | ||
const LLT V16S8 = LLT::fixed_vector(16, 8); | ||
const LLT V8S16 = LLT::fixed_vector(8, 16); | ||
const LLT V4S32 = LLT::fixed_vector(4, 32); | ||
|
||
// 256-bit vectors | ||
const LLT V8S32 = LLT::fixed_vector(8, 32); | ||
const LLT V16S16 = LLT::fixed_vector(16, 16); | ||
const LLT V32S8 = LLT::fixed_vector(32, 8); | ||
|
||
// 256-bit accumulators | ||
const LLT ACC256 = LLT::fixed_vector(4, 64); | ||
|
||
// 512-bit vectors | ||
const LLT V16S32 = LLT::fixed_vector(16, 32); | ||
const LLT V32S16 = LLT::fixed_vector(32, 16); | ||
const LLT V64S8 = LLT::fixed_vector(64, 8); | ||
|
||
// 512-bit accumulators | ||
const LLT ACC512 = LLT::fixed_vector(8, 64); | ||
|
||
// 1024-bit vectors | ||
const LLT V32S32 = LLT::fixed_vector(32, 32); | ||
const LLT V64S16 = LLT::fixed_vector(64, 16); | ||
const LLT V128S8 = LLT::fixed_vector(128, 8); | ||
|
||
// 1024-bit accumulators | ||
const LLT ACC1024 = LLT::fixed_vector(16, 64); | ||
|
||
const LLT S128 = LLT::scalar(128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe put these in a separate header with if guards per generation? It would nice for these to be global in the file and these definitions will probably overlap quite a bit and you'd want to include it more parts of the instruction selection stack.
|
||
IMPLICIT.widenScalarToNextPow2(0).clampScalar(0, S32, S32); | ||
|
||
getActionDefinitionsBuilder(G_CONSTANT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These is a decent overlap of these definitions, which is a shame since it adds duplication. Maybe we can reuse the trick you used for the opcodes and define a base class with createXXXActionDefinitions
calls which is overloaded by the architecture when needed. Then you only get the divergence when you need it.
Disclaimer: didn't test this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this, e.g. through the common CRTP pattern.
I felt the added complexity isn't really worth it, much of the legalization is straight forward. Being able to share the custom legalization code seemed like a good compromise to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can see that. It is very much a DRY-code complexity trade off. My worry is mainly in having to deal with the duplication if the number of variants becomes very large. Then changes to the legalizer might require changes to 10 different files which need to be kept into sync. But that is a worry for the long term and something that can be dealt with then.
23842e9
to
0ff6c68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for cleaning things up and preparing the future 🚀
This factors out the custom legalization actions to be re-usable by AIE specific LegalizerInfo objects.
Moreover, this splits this AIELegalizerInfo object into AIE1LegalizerInfo and AIE2LegalizerInfo objects to get rid of numerous ST.isAIE2() checks, which make the code hard to read and difficult to port to a new architecture version.
There will be some duplication in the declarative legalization action specification for each architecture.