Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pass to assign BD IDs to NPU DMA ops and refactor AMDAIEAssignBufferDescriptorIDsPass #551

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

jtuyls
Copy link
Collaborator

@jtuyls jtuyls commented Jul 15, 2024

Adds a pass to generate BD IDs for NPU DMA operations that avoids race conditions with BDs being reconfigured during execution by some DMA.

This only assigns BD IDs on the L3 side of NPU DMA operations as the backend can currently only handle Shim DMA instructions and MemTile/core DMA configurations are assumed to be part of the xclbin. Therefore, you will see the checks like npuDmaOp.getSourceMemorySpaceAsUInt() == 0 and npuDmaOp.getTargetMemorySpaceAsUInt() == 0 being used (memory space 0 is L3).

BD IDs are represented with an amdaie.bd_id operation, which takes a tile and an integer value as operands as every tile has its own set of BDs. This helps with guaranteeing/verifying correct reuse of the same id across DMA operations. This usage was derived from Xilinx/mlir-aie#1610

Comment on lines +298 to +299
/// Return a map from channels to valid BD ids for the requested tile type.
/// TODO(jornt): find these ranges in the device model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can send them a PR to factor that out (but also that's a good util to use somewhere around here too)

Copy link
Collaborator

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly fine - a few nittish type things to debate - but can you put a description somewhere (eg this PR itself) for the rules for assigning bdids re npuDmaOp.getSourceMemorySpaceAsUInt() and npuDmaOp.getTargetMemorySpaceAsUInt().

npuDmaOp.getTargetMixedSizes(), npuDmaOp.getTargetMixedStrides(),
npuDmaOp.getSourceMixedOffsets(), npuDmaOp.getSourceMixedSizes(),
npuDmaOp.getSourceMixedStrides(), nullptr, bdIdOp);
} else if (npuDmaOp.getTargetMemorySpaceAsUInt() == 0) {
Copy link
Collaborator

@makslevental makslevental Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these conditionals aren't mutually exclusive (I don't think? maybe the first two are but the last one definitely isn't, at least not according to first principles) so I would prefer if each conditional were separate and each just did a return WalkResult::advance(); at the end.

Copy link
Collaborator Author

@jtuyls jtuyls Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these conditionals aren't mutually exclusive

In principal you could create a NPU DMA operation with both source and target on L3/DDR, but that doesn't make much sense in the context of AIE. However, maybe this is not a good place to bake in any of those 'assumptions' and I can make it two separate if statements, but with a single return WalkResult::advance(); at the end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like early returns in ifs because then you don't have to hunt for the exit. And also it prevents multiple ifs firing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

// towards other DMAs and future hardware generations, channel
// assignment should happen before BD assignemnt. This requires more
// refactoring.
std::optional<uint32_t> bdId = generator.getAndAssignBdId(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the next step is to put channel on NpuDmaCpyNdOp...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think channel belongs on the CircularDmaCpyNd operations which are referenced by the NpuDmaCpyNd ops.

Copy link
Collaborator

@makslevental makslevental Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm really? I guess I haven't been paying attention to these ops. That's different from mlir-aie (obv) so that's why I was expecting a channel on this op.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connections (routing, channel/port selection) is set up based on the CircularDmaCpyNd ops after lowering to AIE dialect. As it's set up now, the NPU instructions shouldn't change the channels/DMA ports

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird to me (again only because coming from the mlir-aie perspective) but we can discuss it later/elsewhere.

Copy link
Collaborator

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// towards other DMAs and future hardware generations, channel
// assignment should happen before BD assignemnt. This requires more
// refactoring.
std::optional<uint32_t> bdId = generator.getAndAssignBdId(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird to me (again only because coming from the mlir-aie perspective) but we can discuss it later/elsewhere.

npuDmaOp.getTargetMixedSizes(), npuDmaOp.getTargetMixedStrides(),
npuDmaOp.getSourceMixedOffsets(), npuDmaOp.getSourceMixedSizes(),
npuDmaOp.getSourceMixedStrides(), nullptr, bdIdOp);
} else if (npuDmaOp.getTargetMemorySpaceAsUInt() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

@jtuyls jtuyls merged commit c8b8112 into nod-ai:main Jul 16, 2024
2 checks passed
@jtuyls jtuyls deleted the add-assign-npu-bd-ids branch July 16, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants