You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)
Bug Summary
Long running Ansible jobs are failing with no other information. We have AWX 23.8.0 installed on OpenShift 4.11.57 using the AWX-Operator. I did check the current issues for duplicates so I apologies if this is a duplicate bug.
I am able to replicate this problem in both my Lab and Production environments which run on different OpenShift clusters. Both are the same version of AWX (23.8.0) with same AWX operator (awx-operator.v2.12.0) and same version of Red Hat OpenShift 4.11.57. All long running jobs fail the same way.
I’m happy to provide more information but I am pretty new to AWX. I did increase our containerLogMaxSize to 200mb for better visibility. I also set K8S Ansible Runner Keep-Alive Message Interval to 30.
Right now I am just trying to run a simple Ansible playbook that simply pauses for 120 minutes for troubleshooting / debugging. This job will always fail.
AWX version
23.8.0
Select the relevant components
UI
UI (tech preview)
API
Docs
Collection
CLI
Other
Installation method
openshift
Modifications
no
Ansible version
No response
Operating system
OpenShift 4.11.57
Web browser
Chrome
Steps to reproduce
Within AWX, the Task shows Failed: Task was canceled due to receiving a shutdown signal. I am just running a very similar Ansible playbook that pauses for 120 minutes to replicate the issue. I cannot figure out what is sending a shutdown to the automation
- name: Test long running job in AWX hosts: localhost connection: local gather_facts: no become: no tasks: - name: Pause for 120 minutes to allow testing of the executor pod pause: minutes: 120
Please confirm the following
[email protected]
instead.)Bug Summary
Long running Ansible jobs are failing with no other information. We have AWX 23.8.0 installed on OpenShift 4.11.57 using the AWX-Operator. I did check the current issues for duplicates so I apologies if this is a duplicate bug.
I am able to replicate this problem in both my Lab and Production environments which run on different OpenShift clusters. Both are the same version of AWX (23.8.0) with same AWX operator (awx-operator.v2.12.0) and same version of Red Hat OpenShift 4.11.57. All long running jobs fail the same way.
kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – env | grep ANSIBLE_RUNNER_KEEPALIVE_SECOND
ANSIBLE_RUNNER_KEEPALIVE_SECONDS=30
kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – receptor --version
1.4.4+gc75b1f6
kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – ansible-runner --version
2.3.5
I’m happy to provide more information but I am pretty new to AWX. I did increase our containerLogMaxSize to 200mb for better visibility. I also set K8S Ansible Runner Keep-Alive Message Interval to 30.
Right now I am just trying to run a simple Ansible playbook that simply pauses for 120 minutes for troubleshooting / debugging. This job will always fail.
AWX version
23.8.0
Select the relevant components
Installation method
openshift
Modifications
no
Ansible version
No response
Operating system
OpenShift 4.11.57
Web browser
Chrome
Steps to reproduce
Within AWX, the Task shows Failed: Task was canceled due to receiving a shutdown signal. I am just running a very similar Ansible playbook that pauses for 120 minutes to replicate the issue. I cannot figure out what is sending a shutdown to the automation
- name: Test long running job in AWX hosts: localhost connection: local gather_facts: no become: no tasks: - name: Pause for 120 minutes to allow testing of the executor pod pause: minutes: 120
awx-lab-task-845bbc4f89-w6wkz-awx-lab-task.log
Expected results
I expect the Ansible job to run successfully without timing out.
Actual results
Every job fails with Task was canceled due to receiving a shutdown signal.
I can see the automation-job pod terminate but I cannot figure out what is causing this pod to terminate before the Ansible job is completed.
Additional information
No response
The text was updated successfully, but these errors were encountered: