Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption admission check controller. KEP update. #1178

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion keps/993-two-phase-admission/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Preemption Admission Check Controller](#preemption-admission-check-controller)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit Tests](#unit-tests)
Expand Down Expand Up @@ -206,7 +207,7 @@ pass AND there are some AdmissionChecks configured, AND the
workload is not in on-hold retry state from some check, it will:

1. Fill the Admission field in workload, with the desired flavor assignment.
2. Not do any preemptions yet (unless BookCapacity is set to true).
2. Not do any preemptions yet (unless the check uses `Anytime` preemption policy).
3. Set "QuotaReserved" to true.

Kueue will only pass as many pods into "QuotaReserved" as there would
Expand Down Expand Up @@ -253,6 +254,42 @@ The controller implementing a particular check should:
* After approving the workload, keep an eye on the check if it starts failing,
fail the check and cause the workload to move back to the suspended state.

### Preemption Admission Check Controller
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved

In this proposal, the time to evict the preemption candidates varies based on the preemptor state.
The scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check
that is manged by a single instance of a new built-in admission check controller.

The **Preemption Admission Check Controller** will:

- Watch for a change in state of the workloads pending preemption.
- Watch for workloads that are finishing execution and therefore releasing quota.
- Watch for AdmissionCheck changes, since the preemption policy can change.

The preemption controller uses the kueue cache, since it needs to check the state of workloads admitted to the ClusterQueues.

At every run the controller will get the list of workloads pending preemption.

The workloads pending preemption are divided into:
- `preemtingLeter` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`.
trasc marked this conversation as resolved.
Show resolved Hide resolved
- `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle.

Then:
1. Remove all workloads from the snapshot.
2. For every workload in `preemtingNow` , in the order of their priority:
- If it can fit without the need of evicting other workloads
- Set its admission check to `Ready`
- Add it to the snapshot
- If it cannot fit
- Get an updated list of eviction candidates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify which ones are candidates? Is it only workloads that are fully Admitted? I don't think you want to consider ones that only have ResourceQuota.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The candidates are picked from the list of workloads that hold a quota reservation, just because maybe a admission check of a workload is transitioning slowly to ready dose not mean that it should be able to hold its quota against a higher priority workload.

- If the updated list is not empty.
- Issue the eviction to the candidates.
- Add it to the snapshot
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
- If the updated list is empty, meaning the preemption cannot be done.
- Set its admission check to `Retry`, the quota reservation will be lost and the workload placed in the queue waiting for a new QuotaReservation.

**NOTE** The list of candidates is picked out from the list of workloads holding a QuotaReservation, regardless if they are fully Admitted or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is QuotaReservation enough? Consider jobs waiting for ProvReq to be fulfilled could be waiting for hours. There is no point in preempting them yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProvRequest Admission check can be configured with AfterCheckPassedOrOnDemand policy, the evictions will be done only when the provisioning is finished (and only if still needed).

In the future , if we can know that the provisioning request has no chance to succeed for long period of times, the ProvRecAC can set its state to Retry this will fully release the quota.After that time the state can be set to Pending to put wl back in the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is more about the candidates for eviction. Obviously, fully Admitted workloads are candidates for eviction.
Other than that, I think it should be only the ones that have checks ready or are requesting preemption themselves.

Copy link
Contributor Author

@trasc trasc Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a source of priority inversion. Say we have wl1 (p=1, reserving, one check pending), wl2 (p=2, fully admited), wl3(p=3, needs to evict wl1 or wl2), We should not evict wl2 just because wl1 has some pending admission check.


### Test Plan

[ x ] I/we understand the owners of the involved components may require updates to
Expand Down