-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features to allow async drift / budgets (eg for Azure) #1861
Comments
For Karpenter core, is the ask here more of a feature request? (I don't see a bug in the core library). |
+1 @sftim This strikes me as a feature request for the upstream library as well. @Bryce-Soghigian @tallaxes Perhaps y'all have some more thoughts on the interactions here between the upstream library and the Azure provider? |
What are the circumstances that this kind of issue occurs in Azure? IMO, this may really just exist as a feature request in the Azure provider if we can. AWS doesn't have the same problem FWIW because it returns back the response fairly quickly and gets us out of this loop |
/triage accepted |
/area provider/azure |
/remove-kind bug |
@sftim the reason I put it as a BUG instead of a feature request was that from the design docs for Budgets, I assumed the feature would be compatible with Azure, but with the given implementation it doesn't work with the Azure provider. I tried finding a work around on the Azure side; however, I haven't found a way to be able to integrate the provider with the upstream implementation of this feature. I'm fine with the being labeled as a feature request though, if that aligns better. |
@jonathan-innis, I don't think this is purely on the Azure side. I'm aware of the difference here with AWS' fast response compared to Azure having the LRO. I did some testing on removing the Polling from Azure on the LRO, and we can get a fast response rate in that case and consequently get async drift/budget working correctly. However, we then miss out on gracefully handling certain types of errors as I mentioned. I can do a collection on what some of those error types are. However, it's a little tricky to get a complete list, and have been told by some other members of Azure it'd be good to handle them. If we were to handle this adjustment for Azure jointly between Upstream, and the Azure Provider, I could see something like:
Could also tackle it from the side of disruption triggering things in batches/a slightly different way. This is blocked on not having the ProviderId returned and set on the NodeClaim when attempting to disrupt more. |
Description
Note:
This is a known BUG, with a known cause. It also applies to other forms of Disruption which also provision replacement nodes, beyond just Drift. It requires both changes in
kubernetes-sigs/karpenter
, along with changes in the Azure/karpenter-provider-azure repo, so opening a BUG in both (trackingAzure/karpenter-provider-azure
side here 600)Observed Behavior:
Karpenter is only provisioning and replacing 1 Drifted node at a time, even with a Budget of "5".
Had all
10
nodeclaims marked as Drifted:kubectl get nodeclaims -o go-template='{{range $item := .items}}{{with $nodeclaimname := $item.metadata.name}}{{range $condition := $item.status.conditions}}{{if and (eq $condition.type "Drifted") (eq $condition.status "True")}}{{printf "%s\n" $nodeclaimname}}{{end}}{{end}}{{end}}{{end}}'
Even after over a minute, there was only one new nodeclaim being created:
And only one node tainted as disrupted:
kubectl get nodes -o go-template='{{range $item := .items}}{{with $nodename := $item.metadata.name}}{{range $taint := $item.spec.taints}}{{if and (eq $taint.key "karpenter.sh/disrupted") (eq $taint.effect "NoSchedule")}}{{printf "%s\n" $nodename}}{{end}}{{end}}{{end}}{{end}}'
Looking at the logs, we can see that the cluster is stalled on
waiting on cluster sync
:This is due to the patterning that exists between
kubernetes-sigs/karpenter
, andAzure/karpenter-provider-azure
, stalling the NodeClaim creation at the CloudProvider Create call:karpenter/pkg/controllers/nodeclaim/lifecycle/launch.go
Line 74 in b1b45fc
Internally, the Azure provider has a few LRO (long running operations) which are part of this creation call, which has polling preformed on the call to Azure to ensure the resources have been created correctly, reporting back any issue otherwise. If this Polling is skipped, there would be an issue of certain types of errors on node creation being missed, and thus certain errors would have to wait for the registrationTTL of 15 min, which is unacceptable in these cases.
Expected Behavior:
Karpenter to provision X nodes equal to the Budget for Drift (in this case 5) asynchronously, to replace the Drifted nodes in accordance with the Budget.
Reproduction Steps (Please include YAML):
Deployed a dev version of Azure Karpenter, from the HEAD of main, using
make az-all
:- de7dee7
ran
make az-taintsystemnodes
to ensure workload pods will only land onkarpenter
nodes.Updated, and applied:
examples/v1/general-purpose.yaml
withkarpenter.azure.com/sku-cpu
to less than3
,budgets.nodes
to"5"
, andconsolidateAfter
toNever
:Updated, and applied:
examples/workloads/inflate.yaml
withreplicas
to10
, and cpu requests to1.1
:Update
imageFamily: AzureLinux
in the AKSNodeClass, to trigger a Drift.Versions:
sigs.k8s.io/karpenter: v1.0.5
kubectl version
): v1.29.9The text was updated successfully, but these errors were encountered: