From 20afec6a159ee6b1d3ee47d2a048d33193c723e5 Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Wed, 4 Oct 2023 14:00:20 +0300 Subject: [PATCH 1/6] Preemption admission check controller. KEP update. --- keps/993-two-phase-admission/README.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index 086131fce9..9151bdc74a 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -12,6 +12,7 @@ - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) + - [Preemption Admission Check Controller](#preemption-admission-check-controller) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit Tests](#unit-tests) @@ -206,7 +207,7 @@ pass AND there are some AdmissionChecks configured, AND the workload is not in on-hold retry state from some check, it will: 1. Fill the Admission field in workload, with the desired flavor assignment. -2. Not do any preemptions yet (unless BookCapacity is set to true). +2. Not do any preemptions yet (unless the check uses `Anytime` preemption policy). 3. Set "QuotaReserved" to true. Kueue will only pass as many pods into "QuotaReserved" as there would @@ -253,6 +254,27 @@ The controller implementing a particular check should: * After approving the workload, keep an eye on the check if it starts failing, fail the check and cause the workload to move back to the suspended state. +### Preemption Admission Check Controller + +Because now, the time when the eviction of the preemption candidate should vary based on the preemptor state +the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check +that is manged by a new built-in admission check controller. + +The **Admission Check Controller** will: + +- Watch for a change in state of the workloads pending preemption. +- Watch for workloads that are finishing execution and therefore releasing quota. +- Watch for AdmissionCheck changes, since the preemption policy can change. + +Since evaluating the current preemption state and the needs of a workload requires checking the sate of all resource pool the events are rate limited the kueue's cache is used. + +At every run the controller will get the list of workloads pending preemption. +Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored. +Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`. +If eviction of other workload is still needed, an updated list candidates is created and eviction is issued for all of them. +If the updated list of candidates is empty, meaning that the preemption can no longer succeed, the preemption admission check is set as `Retry`, the workload will louse it's quota reservation and requeued. + + ### Test Plan [ x ] I/we understand the owners of the involved components may require updates to From f777dfabf07ba64610540711f9d49fe491c588b6 Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Thu, 5 Oct 2023 10:05:59 +0300 Subject: [PATCH 2/6] Review remarks. --- keps/993-two-phase-admission/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index 9151bdc74a..bf9915f70a 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -256,7 +256,7 @@ The controller implementing a particular check should: ### Preemption Admission Check Controller -Because now, the time when the eviction of the preemption candidate should vary based on the preemptor state +In this proposal, the time to evict the preemption candidates varies based on the preemptor state the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check that is manged by a new built-in admission check controller. @@ -266,13 +266,13 @@ The **Admission Check Controller** will: - Watch for workloads that are finishing execution and therefore releasing quota. - Watch for AdmissionCheck changes, since the preemption policy can change. -Since evaluating the current preemption state and the needs of a workload requires checking the sate of all resource pool the events are rate limited the kueue's cache is used. +The preemption controller uses the kueue cache, since it needs to check the state of workloads admitted to the ClusterQueues. At every run the controller will get the list of workloads pending preemption. -Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored. -Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`. -If eviction of other workload is still needed, an updated list candidates is created and eviction is issued for all of them. -If the updated list of candidates is empty, meaning that the preemption can no longer succeed, the preemption admission check is set as `Retry`, the workload will louse it's quota reservation and requeued. +1. Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored. +2. For every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`. +3. If eviction of other workload is still needed, an updated list candidates is created and eviction is issued for all of them. +4. If the updated list of candidates is empty, meaning that the preemption can no longer succeed, the preemption admission check is set as `Retry`, the workload will lose it's quota reservation and be requeued. ### Test Plan From 0fd34e368c78105e546c4887225fc4a9be633ade Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Fri, 6 Oct 2023 16:32:16 +0300 Subject: [PATCH 3/6] Review Remarks --- keps/993-two-phase-admission/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index bf9915f70a..617823ef0c 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -260,7 +260,7 @@ In this proposal, the time to evict the preemption candidates varies based on th the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check that is manged by a new built-in admission check controller. -The **Admission Check Controller** will: +The **Preemption Admission Check Controller** will: - Watch for a change in state of the workloads pending preemption. - Watch for workloads that are finishing execution and therefore releasing quota. From 298e28c6e9a208af6d03c9b3dfd56af4644c9aa8 Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Thu, 19 Oct 2023 13:55:18 +0300 Subject: [PATCH 4/6] Check the pending workloads in order if their priority. --- keps/993-two-phase-admission/README.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index 617823ef0c..dee4d6ab41 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -269,11 +269,24 @@ The **Preemption Admission Check Controller** will: The preemption controller uses the kueue cache, since it needs to check the state of workloads admitted to the ClusterQueues. At every run the controller will get the list of workloads pending preemption. -1. Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored. -2. For every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`. -3. If eviction of other workload is still needed, an updated list candidates is created and eviction is issued for all of them. -4. If the updated list of candidates is empty, meaning that the preemption can no longer succeed, the preemption admission check is set as `Retry`, the workload will lose it's quota reservation and be requeued. +The workloads pending preemption are divided into: +- `preemtingLetaer` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`. +- `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle. + +Then: +1. Remove all workloads from the snapshot. +2. For every workload in `preemtingNow` , in the order of their priority: + - If it can fit without the need of evicting other workloads + - Set its admission check to `Ready` + - Add it to the snapshot + - If it cannot fit + - Get an updated list of eviction candidates + - If the updated list is not empty. + - Issue the eviction to the candidates. + - Add it to the snapshot + - If the updated list is empty, meaning the preemption cannot be done. + - Set its admission check to `Ready` ### Test Plan From c6811ef2785222f46faab45a4b084035bdafe46a Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Wed, 6 Dec 2023 09:01:58 +0200 Subject: [PATCH 5/6] Review Remarks. --- keps/993-two-phase-admission/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index dee4d6ab41..ba5bf5081d 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -256,9 +256,9 @@ The controller implementing a particular check should: ### Preemption Admission Check Controller -In this proposal, the time to evict the preemption candidates varies based on the preemptor state -the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check -that is manged by a new built-in admission check controller. +In this proposal, the time to evict the preemption candidates varies based on the preemptor state. +The scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check +that is manged by a single instance of a new built-in admission check controller. The **Preemption Admission Check Controller** will: @@ -271,7 +271,7 @@ The preemption controller uses the kueue cache, since it needs to check the stat At every run the controller will get the list of workloads pending preemption. The workloads pending preemption are divided into: -- `preemtingLetaer` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`. +- `preemtingLeter` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`. - `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle. Then: @@ -286,7 +286,9 @@ Then: - Issue the eviction to the candidates. - Add it to the snapshot - If the updated list is empty, meaning the preemption cannot be done. - - Set its admission check to `Ready` + - Set its admission check to `Retry`, the quota reservation will be lost and the workload placed in the queue waiting for a new QuotaReservation. + +**NOTE** The list of candidates is picked out from the list of workloads holding a QuotaReservation, regardless if they are fully Admitted or not. ### Test Plan From 9ddfc1b8698867b3ae78ffa9ba5767c83c61a75c Mon Sep 17 00:00:00 2001 From: Traian Schiau Date: Fri, 8 Dec 2023 12:25:18 +0200 Subject: [PATCH 6/6] Review Remarks --- keps/993-two-phase-admission/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/993-two-phase-admission/README.md b/keps/993-two-phase-admission/README.md index ba5bf5081d..7bd03834c2 100644 --- a/keps/993-two-phase-admission/README.md +++ b/keps/993-two-phase-admission/README.md @@ -271,7 +271,7 @@ The preemption controller uses the kueue cache, since it needs to check the stat At every run the controller will get the list of workloads pending preemption. The workloads pending preemption are divided into: -- `preemtingLeter` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`. +- `preemtingLater` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`. - `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle. Then: