-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Dimensional Buffer Descriptors in AIE2 #547
Conversation
lib/Targets/AIETargetXAIEV2.cpp
Outdated
@@ -938,14 +939,14 @@ mlir::LogicalResult AIETranslateToXAIEV2(ModuleOp module, raw_ostream &output) { | |||
if (!lock.hasName()) | |||
return; | |||
std::string lockName(lock.name().getValue()); | |||
output << "int mlir_aie_acquire_" << lockName << "(" << ctx_p | |||
output << "AieRC mlir_aie_acquire_" << lockName << "(" << ctx_p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd specifically like to avoid leaking libxaie types out of this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, I'll put this back to int. The rationale was that, in the tutorials, this is checked against XAIE_OK. But since XAIE_OK is just a macro, hence untyped, it will work with ints too, though. (Although in my mind is an AieRC type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all generated functions in aie_inc.cpp
to return int
, so we no longer return AieRC
anywhere. Note that this changes the signature of more functions than just this one. A number of them currently return AieRc. It should be consistently returning int everywhere after this PR.
lib/Dialect/AIE/IR/AIEDialect.cpp
Outdated
::mlir::MemRefType buffer = getBuffer().getType(); | ||
if(!buffer.getElementType().isInteger(32)) { | ||
// The AIE2 specification prescribes that multi-dimensional address | ||
// generation creates addresses to 32 bit words. Hence, stepSize and wrap | ||
// refer to 32 bit words. To avoid confusion, we disallow using multi- | ||
// dimensional BDs with other memrefs. | ||
return emitOpError() << "Multi-dimensional buffer descriptors are only " | ||
"supported for 32 bit integer elements."; | ||
} | ||
uint64_t base_addr = getOffset(); | ||
uint64_t memref_size = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests for these error checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let me add some FileCheck tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phil and me also discussed making stepsize and wrap byte-sized arguments, and throwing an error if the user passes in values that are not divisible by four. That may be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this is cleaner and more consistent with the other arguments of the dmaBD, which are also referenced to the size of the memref element type. From a generality perspective, it would be nice to remote the limitation on the element type here, but this is probably good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see arguments for doing it both ways. As to my understanding, the limitation that the wrap and stepsize is in 4 byte increments is architectural. I think I will implement the following:
The stepsize and wrap are in increments of the element type. However, if the element type is smaller than 4 bytes, I will also verify that the stepsize/wrap are multiples of 4 bytes.
For example: i8 with step size of 2 --> error,
i8 with step size of 4 --> ok
i16 with step size of 2 --> ok
i16 with step size of 1 --> error
i32 with any step size --> ok
Does that seem reasonable?
One thing that may be confusing about this is that the DMA will still copy 32 bit words. So, for example, if we have an i8 array and a step size of 4, one may expect that only every fourth byte receives a value into it, but this won't be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be able to use the multi-dimensional BDs with non-i32 element types and have the compiler assume I know what I'm doing (i.e. not disallow it, sanity error checking is OK). In my opinion the wrap and step should expose the hardware registers directly without trying to do any conversion (e.g. no bytes -> words conversion). This is the cleanest and doesn't require casting to use multi-dimensional BDs in the general case (non-i32 data). The feature is much much less useful if only i32 memrefs are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that at the hardware level (what you pass into the registers), wrap and step size are i32/4 byte increments. You cannot do wrap/step size at byte granularity.
For example, one thing you cannot use this for is, e.g. extracting the R, G, B, A channels out of a matrix if each of the channels is encoded as one byte.
That being said, I could remove this check and just issue a warning instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the warning too. IMHO non-i32 data is the common case and shouldn't issue a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's say a user has two i8 arrays
dst = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
and
src = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
and they sets up BDs, with the goal of copying some elements from src
into dst
.
Let's say they set the stride of the BD to 4
bytes / 4
elements (value 1
will be stored to register).
They may expect
dst = [1, 0, 0, 0, 5, 0, 0, 0, 9, 0, 0, 0, 13, 0, 0, 0]
i.e. copy every fourth element, but they will get
dst = src
since four byte words are copied each time.
With data type i8
and stride 4
, isn't that highly unintuitive behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that highly unintuitive behavior?
It doesn't matter if it's intuitive. That's what the hardware does and at this level of aie dialect we should expose the hardware to the user.
aff99a6
to
6812bda
Compare
It appears this broke some tests which I didn't notice running locally. I will address those tomorrow. |
aca1633
to
b5403e6
Compare
do not expose AieRC type in aie_inc.cpp functions do not print redundant error messages use C++ raw string for code block rename __mlir_aie_print to __mlir_aie_verbose
I implemented the changes discussed in the meeting, added a number of tests, and rebased on current main. @jgmelber |
…size, rank, and contiguity; reorder arguments to be highest dimension first
This PR adds syntax and proper lowerings, as well as a simulator-based test, to the
dmaBd
operation that allows users to specifystepsize
andwrap
for their buffer descriptors.Highlighted above is the new attribute added. It is an array of <stepsize, wrap> pairs. All of this can be omitted, which gives a regular 1D array behavior and hence retains backwards compatibility.
This gets lowered to the proper
XAie_DmaSetMultiDimAddr
calls to configure the DMAs.A simulator based test is included.
The PR also includes some minor refactoring/clean up of some duplicated code I stumbled upon while implementing this.
This PR also includes the changes in #542, since both modify the libXAIE lowering. If this is merged, #542 can be closed.