Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Dimensional Buffer Descriptors in AIE2 #547

Merged
merged 12 commits into from
Jul 26, 2023
Merged

Conversation

andrej
Copy link
Collaborator

@andrej andrej commented Jul 19, 2023

This PR adds syntax and proper lowerings, as well as a simulator-based test, to the dmaBd operation that allows users to specify stepsize and wrap for their buffer descriptors.

 AIE.dmaBd(<%buf14 : memref<128xi32>, 0, 128>, 0, [<2, 8>, <1, 2>, <16, 8>])
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Highlighted above is the new attribute added. It is an array of <stepsize, wrap> pairs. All of this can be omitted, which gives a regular 1D array behavior and hence retains backwards compatibility.

This gets lowered to the proper XAie_DmaSetMultiDimAddr calls to configure the DMAs.

A simulator based test is included.

The PR also includes some minor refactoring/clean up of some duplicated code I stumbled upon while implementing this.

This PR also includes the changes in #542, since both modify the libXAIE lowering. If this is merged, #542 can be closed.

include/aie/Dialect/AIE/IR/AIE.td Outdated Show resolved Hide resolved
lib/Dialect/AIE/IR/AIEDialect.cpp Outdated Show resolved Hide resolved
@@ -938,14 +939,14 @@ mlir::LogicalResult AIETranslateToXAIEV2(ModuleOp module, raw_ostream &output) {
if (!lock.hasName())
return;
std::string lockName(lock.name().getValue());
output << "int mlir_aie_acquire_" << lockName << "(" << ctx_p
output << "AieRC mlir_aie_acquire_" << lockName << "(" << ctx_p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd specifically like to avoid leaking libxaie types out of this interface.

Copy link
Collaborator Author

@andrej andrej Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I'll put this back to int. The rationale was that, in the tutorials, this is checked against XAIE_OK. But since XAIE_OK is just a macro, hence untyped, it will work with ints too, though. (Although in my mind is an AieRC type)

Copy link
Collaborator Author

@andrej andrej Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all generated functions in aie_inc.cpp to return int, so we no longer return AieRC anywhere. Note that this changes the signature of more functions than just this one. A number of them currently return AieRc. It should be consistently returning int everywhere after this PR.

Comment on lines 1003 to 1013
::mlir::MemRefType buffer = getBuffer().getType();
if(!buffer.getElementType().isInteger(32)) {
// The AIE2 specification prescribes that multi-dimensional address
// generation creates addresses to 32 bit words. Hence, stepSize and wrap
// refer to 32 bit words. To avoid confusion, we disallow using multi-
// dimensional BDs with other memrefs.
return emitOpError() << "Multi-dimensional buffer descriptors are only "
"supported for 32 bit integer elements.";
}
uint64_t base_addr = getOffset();
uint64_t memref_size = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for these error checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let me add some FileCheck tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phil and me also discussed making stepsize and wrap byte-sized arguments, and throwing an error if the user passes in values that are not divisible by four. That may be cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this is cleaner and more consistent with the other arguments of the dmaBD, which are also referenced to the size of the memref element type. From a generality perspective, it would be nice to remote the limitation on the element type here, but this is probably good enough for now.

Copy link
Collaborator Author

@andrej andrej Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see arguments for doing it both ways. As to my understanding, the limitation that the wrap and stepsize is in 4 byte increments is architectural. I think I will implement the following:

The stepsize and wrap are in increments of the element type. However, if the element type is smaller than 4 bytes, I will also verify that the stepsize/wrap are multiples of 4 bytes.

For example: i8 with step size of 2 --> error,
i8 with step size of 4 --> ok
i16 with step size of 2 --> ok
i16 with step size of 1 --> error
i32 with any step size --> ok

Does that seem reasonable?

One thing that may be confusing about this is that the DMA will still copy 32 bit words. So, for example, if we have an i8 array and a step size of 4, one may expect that only every fourth byte receives a value into it, but this won't be the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to use the multi-dimensional BDs with non-i32 element types and have the compiler assume I know what I'm doing (i.e. not disallow it, sanity error checking is OK). In my opinion the wrap and step should expose the hardware registers directly without trying to do any conversion (e.g. no bytes -> words conversion). This is the cleanest and doesn't require casting to use multi-dimensional BDs in the general case (non-i32 data). The feature is much much less useful if only i32 memrefs are supported.

Copy link
Collaborator Author

@andrej andrej Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that at the hardware level (what you pass into the registers), wrap and step size are i32/4 byte increments. You cannot do wrap/step size at byte granularity.

For example, one thing you cannot use this for is, e.g. extracting the R, G, B, A channels out of a matrix if each of the channels is encoded as one byte.

That being said, I could remove this check and just issue a warning instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the warning too. IMHO non-i32 data is the common case and shouldn't issue a warning.

Copy link
Collaborator Author

@andrej andrej Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's say a user has two i8 arrays

dst = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

and

src = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

and they sets up BDs, with the goal of copying some elements from src into dst.

Let's say they set the stride of the BD to 4 bytes / 4 elements (value 1 will be stored to register).

They may expect

dst = [1, 0, 0, 0, 5, 0, 0, 0, 9, 0, 0, 0, 13, 0, 0, 0]

i.e. copy every fourth element, but they will get

dst = src

since four byte words are copied each time.

With data type i8 and stride 4, isn't that highly unintuitive behavior?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that highly unintuitive behavior?

It doesn't matter if it's intuitive. That's what the hardware does and at this level of aie dialect we should expose the hardware to the user.

@andrej andrej force-pushed the nd-dma branch 4 times, most recently from aff99a6 to 6812bda Compare July 19, 2023 22:47
@andrej
Copy link
Collaborator Author

andrej commented Jul 20, 2023

It appears this broke some tests which I didn't notice running locally. I will address those tomorrow.

@andrej andrej force-pushed the nd-dma branch 2 times, most recently from aca1633 to b5403e6 Compare July 21, 2023 20:41
@andrej
Copy link
Collaborator Author

andrej commented Jul 26, 2023

I implemented the changes discussed in the meeting, added a number of tests, and rebased on current main. @jgmelber

…size, rank, and contiguity; reorder arguments to be highest dimension first
@fifield fifield self-requested a review July 26, 2023 20:23
@jgmelber jgmelber merged commit 2ded45c into Xilinx:main Jul 26, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants