fix: affinity priority #1548

helen-frank · 2024-08-11T01:46:27Z

Description
Priority scheduling of pods with anti-affinity or topologySpreadConstraints
How was this change tested?
I have 10 pending pods:

pod1: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod10 and pod9.
pod2 ~ pod8: 1c1g requests; no anti-affinity is configured.
pod9: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod10.
pod10: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod9.

I want the resources of the three nodes to be evenly distributed, like:

node1: c7a.4xlarge, 8c16g (4Pod)
node2: c7a.xlarge, 4c8g (3Pod)
node3: c7a.xlarge, 4c8g (3Pod)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2024-08-11T01:46:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: helen-frank
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2024-08-25T12:01:49Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

helen-frank · 2024-08-30T04:23:35Z

Current Test Results:

❯ kubectl get nodeclaims
NAME            TYPE               CAPACITY   ZONE          NODE                             READY   AGE
default-8wq87   c-8x-amd64-linux   spot       test-zone-d   blissful-goldwasser-3014441860   True    67s
default-chvld   c-4x-amd64-linux   spot       test-zone-b   exciting-wescoff-4170611030      True    67s
default-kbr7n   c-2x-amd64-linux   spot       test-zone-d   vibrant-aryabhata-969189106      True    67s
❯ kubectl get pod -owide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                             NOMINATED NODE   READINESS GATES
nginx1-67877d4f4d-nbmj7    1/1     Running   0          77s   10.244.1.0   vibrant-aryabhata-969189106      <none>           <none>
nginx10-6685645984-sjftg   1/1     Running   0          76s   10.244.2.2   exciting-wescoff-4170611030      <none>           <none>
nginx2-5f45bfcb5b-flrlw    1/1     Running   0          77s   10.244.2.0   exciting-wescoff-4170611030      <none>           <none>
nginx3-6b5495bfff-xt7d9    1/1     Running   0          77s   10.244.2.1   exciting-wescoff-4170611030      <none>           <none>
nginx4-7bdd687bb6-nzc8f    1/1     Running   0          77s   10.244.3.5   blissful-goldwasser-3014441860   <none>           <none>
nginx5-6b5d886fc7-6m57l    1/1     Running   0          77s   10.244.3.0   blissful-goldwasser-3014441860   <none>           <none>
nginx6-bd5d6b9fb-x6lkq     1/1     Running   0          77s   10.244.3.2   blissful-goldwasser-3014441860   <none>           <none>
nginx7-5559545b9f-xs5sm    1/1     Running   0          77s   10.244.3.4   blissful-goldwasser-3014441860   <none>           <none>
nginx8-66bb679c4-zndwz     1/1     Running   0          76s   10.244.3.1   blissful-goldwasser-3014441860   <none>           <none>
nginx9-6c47b869dd-nfds6    1/1     Running   0          76s   10.244.3.3   blissful-goldwasser-3014441860   <none>           <none>

coveralls · 2024-08-30T12:14:29Z

Pull Request Test Coverage Report for Build 11357525644

Details

21 of 31 (67.74%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.04%) to 80.872%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/provisioning/scheduling/queue.go	4	6	66.67%
pkg/utils/pod/scheduling.go	17	25	68.0%

Totals
Change from base Build 11332670114:	-0.04%
Covered Lines:	8511
Relevant Lines:	10524

💛 - Coveralls

njtran

This isn't necessarily as clear-cut of a change to me. Is there data that you've generated to give you confidence that this doesn't have any adverse affects?

njtran · 2024-09-12T17:01:07Z

pkg/controllers/provisioning/scheduling/queue.go

@@ -96,6 +97,15 @@ func byCPUAndMemoryDescending(pods []*v1.Pod) func(i int, j int) bool {
 			return true
 		}

+		// anti-affinity pods should be sorted before normal pods
+		if affinityCmp := pod.PodAffinityCmp(lhsPod, rhsPod); affinityCmp != 0 {


This seems like the right move, but I'm not sure how this breaks down in our bin-packing algorithm. From what I understand, this just sorts pods with affinity + tsc before others with the same exact pod requests.

Yes, after testing this approach (there is a small test case in the previous section), scheduling the mutually exclusive pods further ahead helps to get a more balanced scheduling result

With this approach, the cluster will be more stable (e.g., draining one node will not cause most pods to be rescheduled). I observed that Karpenter attempts to distribute the pods across all nodes:
Scheduler Code

cc @njtran @jonathan-innis , please take a look

pkg/utils/pod/scheduling.go

github-actions · 2024-10-03T12:02:09Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Signed-off-by: helen <[email protected]>

njtran · 2024-10-23T16:17:42Z

scheduling the mutually exclusive pods further ahead helps to get a more balanced scheduling result

Can you share the data that led you to this conclusion? Without going in and testing it myself, it's not clear to me how you came to this conclusion.

helen-frank · 2024-10-24T09:34:45Z

Current Test Results:

❯ kubectl get nodeclaims
NAME            TYPE               CAPACITY   ZONE          NODE                             READY   AGE
default-8wq87   c-8x-amd64-linux   spot       test-zone-d   blissful-goldwasser-3014441860   True    67s
default-chvld   c-4x-amd64-linux   spot       test-zone-b   exciting-wescoff-4170611030      True    67s
default-kbr7n   c-2x-amd64-linux   spot       test-zone-d   vibrant-aryabhata-969189106      True    67s
❯ kubectl get pod -owide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                             NOMINATED NODE   READINESS GATES
nginx1-67877d4f4d-nbmj7    1/1     Running   0          77s   10.244.1.0   vibrant-aryabhata-969189106      <none>           <none>
nginx10-6685645984-sjftg   1/1     Running   0          76s   10.244.2.2   exciting-wescoff-4170611030      <none>           <none>
nginx2-5f45bfcb5b-flrlw    1/1     Running   0          77s   10.244.2.0   exciting-wescoff-4170611030      <none>           <none>
nginx3-6b5495bfff-xt7d9    1/1     Running   0          77s   10.244.2.1   exciting-wescoff-4170611030      <none>           <none>
nginx4-7bdd687bb6-nzc8f    1/1     Running   0          77s   10.244.3.5   blissful-goldwasser-3014441860   <none>           <none>
nginx5-6b5d886fc7-6m57l    1/1     Running   0          77s   10.244.3.0   blissful-goldwasser-3014441860   <none>           <none>
nginx6-bd5d6b9fb-x6lkq     1/1     Running   0          77s   10.244.3.2   blissful-goldwasser-3014441860   <none>           <none>
nginx7-5559545b9f-xs5sm    1/1     Running   0          77s   10.244.3.4   blissful-goldwasser-3014441860   <none>           <none>
nginx8-66bb679c4-zndwz     1/1     Running   0          76s   10.244.3.1   blissful-goldwasser-3014441860   <none>           <none>
nginx9-6c47b869dd-nfds6    1/1     Running   0          76s   10.244.3.3   blissful-goldwasser-3014441860   <none>           <none>

@njtran This is the real scheduling result I got by using kwok as provider, and creating 10 deployments (where pod1, pod9, pod10 are mutually exclusive), you can see that now the scheduling instance specification is more balanced compared to the previous one, it's 8,4,2, instead of 16,2,2

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2024

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 11, 2024

k8s-ci-robot requested review from engedaam and jackfrancis August 11, 2024 01:46

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 11, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2024

helen-frank changed the title ~~[WIP] fix: affinity priority~~ fix: affinity priority Aug 30, 2024

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2024

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2024

helen-frank force-pushed the fix-issue-1418 branch from edadb85 to 2581408 Compare August 30, 2024 12:03

njtran reviewed Sep 12, 2024

View reviewed changes

helen-frank requested a review from njtran September 14, 2024 02:25

helen-frank force-pushed the fix-issue-1418 branch from 2581408 to 6806f12 Compare September 14, 2024 02:27

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2024

helen-frank force-pushed the fix-issue-1418 branch from 379b0c6 to 46e7949 Compare October 3, 2024 12:45

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2024

fix issue 1418, affinity priority

ea438bc

Signed-off-by: helen <[email protected]>

helen-frank force-pushed the fix-issue-1418 branch from 46e7949 to ea438bc Compare October 16, 2024 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: affinity priority #1548

fix: affinity priority #1548

helen-frank commented Aug 11, 2024

k8s-ci-robot commented Aug 11, 2024

github-actions bot commented Aug 25, 2024

helen-frank commented Aug 30, 2024

coveralls commented Aug 30, 2024 •

edited

Loading

njtran left a comment

njtran Sep 12, 2024

helen-frank Sep 13, 2024

jwcesign Sep 13, 2024

jwcesign Sep 18, 2024

github-actions bot commented Oct 3, 2024

njtran commented Oct 23, 2024

helen-frank commented Oct 24, 2024 •

edited

Loading

fix: affinity priority #1548

Are you sure you want to change the base?

fix: affinity priority #1548

Conversation

helen-frank commented Aug 11, 2024

k8s-ci-robot commented Aug 11, 2024

github-actions bot commented Aug 25, 2024

helen-frank commented Aug 30, 2024

coveralls commented Aug 30, 2024 • edited Loading

Pull Request Test Coverage Report for Build 11357525644

Details

💛 - Coveralls

njtran left a comment

Choose a reason for hiding this comment

njtran Sep 12, 2024

Choose a reason for hiding this comment

helen-frank Sep 13, 2024

Choose a reason for hiding this comment

jwcesign Sep 13, 2024

Choose a reason for hiding this comment

jwcesign Sep 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 3, 2024

njtran commented Oct 23, 2024

helen-frank commented Oct 24, 2024 • edited Loading

coveralls commented Aug 30, 2024 •

edited

Loading

helen-frank commented Oct 24, 2024 •

edited

Loading