Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload above nominal of CQ prioritized over workload within nominal quota of other CQ #3405

Open
gabesaba opened this issue Oct 31, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@gabesaba
Copy link
Contributor

What happened:
Suppose system has CQs sharing a Cohort. Two CQs are trying to admit a workload. These CQs are lending capacity and configure reclaimWithinCohort=any.

Suppose WL1 requests fit within nominal capacity of CQ1, but the sum of the running workloads' plus new workload's requests surpass the nominal + borrowing limit of CQ1.

Suppose WL2 fits within the nominal capacity of CQ2, even considering other running workloads in CQ2. Suppose that excess capacity of CQ2 is being lend out to, and used by other ClusterQueues in the Cohort, so that CQ2 needs to issue preemptions to reclaim its nominal quota.

WL1 will be considered not borrowing (code). If WL1 was created before WL2 - and priority sorting/fair sharing are disabled - it will be processed first in a scheduling cycle (code).

It may end up reserving capacity in the Cohort (code), which WL2 is depending on to be able to schedule (code). WL2 is blocked indefinitely, unable to issue preemptions until WL1 successfully schedules.

What you expected to happen:
Even without FairSharing enabled, WL2 should be sorted before WL1 and able to issue preemptions immediately, since it fits within nominal capacity without borrowing required.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
FairSharing should solve this problem, as CL1+WL1 will have a higher DominantResourceShare than CL2+WL2

Environment:

  • Kueue version: 0.8.1
@gabesaba gabesaba added the kind/bug Categorizes issue or PR as related to a bug. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant