test: Relax flaky comparison in AsyncWorkQueue parallel test #379
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This test has had a FIXME for a long time:
In CI, this comparison is flaky and fails about 15-20% of the time over a sample size of the past 7 months.
For the time being, I think it would improve pipeline health to relax this condition to expect any speedup from parallelism whatsoever, and revisit this later. Justification for revisiting this test later is below.
Alternatives:
Justifications
In practice, the AsyncWorkQueue is initialized with the
--buffer-manager-thread-count
value, which when used viatritonserver
binary, doesn't appear to correctly propogate to backends that initialize the AsyncWorkQueue anyways, which ledL0_parallel_copy
(e2e test of AsyncWorkQueue) to be disabled in the past. This is just to back up the reasoning that it is OK to relax this unit test's strictness, since the underlying feature isn't being utilized correctly in Triton/backends today.This test will have more importance when the e2e functionality is working as expected. That would involve taking a closer look at:
Some background info to back up the flaw in AsyncWorkQueue today, the core recognizes the singleton to be at address
0x7061331fe1d0
correctly intialized with a thread pool, but the backend sees it at address0x706126096610
and is also uninitialized with an empty thread pool, so the "singleton" nature doesn't appear to be working as expected: