High concurrency causes delay in AWX Job starts in a workflow #15419
Labels
community
component:api
component:awx_collection
issues related to the collection for controlling AWX
component:ui
needs_triage
type:bug
Please confirm the following
[email protected]
instead.)Bug Summary
When running more than 450 concurrent workflows against the same workflow job template on different inventories, there is a notable delay in starting all the AWX jobs within the workflow. Initially, jobs remain in a "Pending" state and transition to a "Running" state after an average delay of 2 minutes. This issue does not occur when the concurrency is limited to 150 workflows.
AWX Version Upgrade: Recently upgraded from AWX version 22.5.0 to 23.9.0.
Environment: AWX is hosted on EKS (Elastic Kubernetes Service) version 1.28.
Resources allocation: Replica count is set to 10 for awx-web and awx-task pods each.
awx-web requests: cpu: 1500m and memory: 2Gi
awx-task requests: cpu: 4000m and memory: 8Gi
Database Performance: 50% cpu utilization and we have 20 control plane nodes running (each Ec2's with cpu: 8000m and memory: 32Gi)
When this issue happens, I captured and attached logs from datadog for automation-job-id.
automation-job-id.logs.txt
I'm also seeing ~2min delay and this happened for every awx job that runs in bulk. The delay is between job-10243331 created and job-10243331 work unit id assigned about inventory sync and some other commands that are executing in control plane nodes.
What could be the reason for this delay and what can be done to avoid this ?
AWX version
23.9.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
Chrome
Steps to reproduce
Run more than 400 workflows simultaneously
Expected results
The delays in starting AWX jobs within high-concurrency workflows can be minimized
Actual results
Delays in starting the awx jobs
Additional information
No response
The text was updated successfully, but these errors were encountered: