Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with descheduler #90

Open
hazxel opened this issue Jun 28, 2022 · 7 comments
Open

Working with descheduler #90

hazxel opened this issue Jun 28, 2022 · 7 comments

Comments

@hazxel
Copy link

hazxel commented Jun 28, 2022

Hi there,

I'm using the descheduler to rescedule a pod to another node. However, the descheduler complaints that every node has insufficient resource "telemetry/scheduling", which prevent the pod being rescheduled.
(I've checked the source code of the descheduler, and it only evicts pods that don't fit the current node and fit some other nodes, see the code below from node_affinity.go)

pods, err := podutil.ListPodsOnANode(
	node.Name,
	getPodsAssignedToNode,
	podutil.WrapFilterFuncs(podFilter, func(pod *v1.Pod) bool {
		return evictorFilter.Filter(pod) &&
			!nodeutil.PodFitsCurrentNode(getPodsAssignedToNode, pod, node) &&
			nodeutil.PodFitsAnyNode(getPodsAssignedToNode, pod, nodes)
	}),
)
if err != nil {
	klog.ErrorS(err, "Failed to get pods", "node", klog.KObj(node))
}

for _, pod := range pods {
	if pod.Spec.Affinity != nil && pod.Spec.Affinity.NodeAffinity != nil && pod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
		klog.V(1).InfoS("Evicting pod", "pod", klog.KObj(pod))
		if _, err := podEvictor.EvictPod(ctx, pod, node, "NodeAffinity"); err != nil {
			klog.ErrorS(err, "Error evicting pod")
			break
		}
	}
}

I'm using the setting up of health-metric-demo. The logs of the descheduler are like:

I0628 14:32:11.838531   72888 node_affinity.go:78] "Processing node" node="minikube-m02"
I0628 14:32:11.838554   72888 node.go:183] "Pod does not fit on node" pod="default/demo-app-77fbd8745b-hhsxl" node="minikube-m02"
I0628 14:32:11.838557   72888 node.go:185] "insufficient telemetry/scheduling"
I0628 14:32:11.838568   72888 node.go:166] "Pod does not fit on node" pod="default/demo-app-77fbd8745b-hhsxl" node="minikube"
I0628 14:32:11.838571   72888 node.go:168] "insufficient telemetry/scheduling"
I0628 14:32:11.838579   72888 node.go:166] "Pod does not fit on node" pod="default/demo-app-77fbd8745b-hhsxl" node="minikube-m02"
I0628 14:32:11.838582   72888 node.go:168] "insufficient telemetry/scheduling"
I0628 14:32:11.838591   72888 node.go:166] "Pod does not fit on node" pod="default/demo-app-77fbd8745b-hhsxl" node="minikube-m03"
I0628 14:32:11.838619   72888 node.go:168] "pod node selector does not match the node label"
I0628 14:32:11.838624   72888 node.go:168] "insufficient telemetry/scheduling"
I0628 14:32:11.839395   72888 descheduler.go:312] "Number of evicted pods" totalEvicted=0

In your instructions for the health-demo, the pod simply re-scheduled to another node, so I'm wondering how do you work around this problem?

Many thanks!

@uniemimu
Copy link
Collaborator

The health-demo predates the nodeFit feature of the descheduler. You might want to turn nodeFit off in your descheduler configuration, and see if that helps a bit. Based on a quick look at the nodeFit implementation in the descheduler it looks to me it doesn't care about the configuration of the scheduler, and as a result, it doesn't honor the fact that the resources in question are in fact configured as ignoredByScheduler

If my quick analysis above is correct, then the proper place for the issue would be at the descheduler project, which should in my opinion either honor the scheduler config somehow directly, and automatically ignore those resources (this would be nice indeed), or at least allow for configuring the same resources as ignoredByDescheduler.

As a workaround, if you need to keep the nodeFit feature enabled, you might resort to what I previously told you not to do: create the extended resource for the telemetry resource by hand (curl) for the nodes. The scheduler should still be configured to ignore the resource, so it won't be consumed.

Another issue which you may or may not stumble at with the descheduler node_affinity strategy is the fact that it will not deschedule anything unless there is another node ready where the pod could be scheduled to. Ref descheduler issue nr 640. Turning the nodeFit off won't help for that, unfortunately.

@uniemimu
Copy link
Collaborator

Filed to the descheduler project, we'll see what the solution will be: kubernetes-sigs/descheduler#863

@hazxel
Copy link
Author

hazxel commented Jun 29, 2022

Thanks! I also noticed that the nodeFit flag is not taking effects, but good news is that posting the extended resource to all work nodes would solve the problem for now. Though it would be nice if the flag thing could be fixed later on.

@uniemimu
Copy link
Collaborator

Thanks! I also noticed that the nodeFit flag is not taking effects, but good news is that posting the extended resource to all work nodes would solve the problem for now. Though it would be nice if the flag thing could be fixed later on.

That flag is particularly confusing, as witnessed by descheduler issue number 845. Feel free to chime in, perhaps more people with the same expectations about the flag would make the maintainers reconsider the point that maybe turning the flag off should make some sort of a difference.

@togashidm
Copy link
Contributor

hazxel, thank you to point out this, and hopefully, this can be solved within the descheduler project.
Meanwhile, the workaround provided by uniemimu works well as you commented above. Another option would be to use the previous descheduler version. Just a tip: In our case, in advertising in all nodes the extended resource we used value: 111 for the extender resource, only then did the descheduler start to take attention.

@hazxel
Copy link
Author

hazxel commented Jun 29, 2022

Hi togashidm, thanks for the tip. The value can actually be any positive integer but have to be big enough right? Please correct me if I am wrong.

@togashidm
Copy link
Contributor

yes, just big enough to get the effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants