-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VmAgents Shards Autoscaling Issues #924
Comments
Hi @togikiran ,
What do you mean by
It's unknown because metrics server doesn't know what's vmagent's cpu utilization now, you need to create the metric and report to metrics server.
That's a bug, since we don't have label propagation in |
Hey @Haleygo When i increased the replicas from 2->3 , the older pods got rolledout and new pods came. Shared the screenshots
Default metric-server knows the pods metrics right, do you mean we need to add for vmagent customresource as well ? If yes can you help with the approach ? Can you please share a sample hpa yaml file (k8s) for autoscaling vmagents shards based on cpu. Thanks |
Hello, Due to current sharding implementation of vmagent, all flags for the all vmagents must be changed. It requires restart of all pods with new flag value. |
Hey @f41gh7 , Is there a way to skip or bypass the pods restart because this will impact and restart for every scaleup. Is there any mitigation inplace for this issue ? Thanks |
No, it won't work if pod doesn't restart with new
I'd recommend to use keda here. It can use prometheus as direct trigger, like this
|
jfyi, status label selector should be fixed in a6e3ad7. |
@Haleygo observed metric loss during vmagents pod scaleup i.e all pods are getting recreated after increase in replica count. This is impacting production clusters. Is there any workaround for this ? |
Observing metric loss while vmagent scaling. Added hpa on cpu and memory metrics, pods are getting rolled out and observing metric loss. Production clusters are impacted |
hmm, I'm afraid that's expected with current implementation, workaround would be also set |
Hey @togikiran , |
@Haleygo VMAgent sharding relies on the -promscrape.cluster.membersCount flag, which needs to be updated whenever the deployment or statefulset scales up or down. This update necessitates recreating all pods. Enabling dynamic discovery of this value would eliminate the need for manual updates and pod recreation. |
@f41gh7
Why the vmagent pods are getting rotated whenever we are increasing the replica-count ?
Command used: kubectl scale vmagent-shard-ha --replicas=3
How should we configure hpa to scale based on cpu/memory utilisation ?
What the above hpa scaling metric is showing unknow
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE vmagent-shard-ha VMAgent/vmagent-shard-ha <unknown>/80% 3 3 3 14m
Issue: missing label selector status
kubectl get --raw /apis/operator.victoriametrics.com/v1beta1/namespaces/<namespace>/vmagents/monitoring-vmagent-ha/scale {"kind":"Scale","apiVersion":"autoscaling/v1","metadata":{"name":"monitoring-vmagent-ha","namespace":"<namespace>","uid":"d80e7371-cd49-4caa-8765-3a78220f9543","resourceVersion":"351547683","creationTimestamp":"2024-04-17T12:26:29Z"},"spec":{"replicas":3},"status":{"replicas":3}}
vm-operator version: v0.30.0
vmagent: v1.90.0
The text was updated successfully, but these errors were encountered: