AWX Jobs Failing with "Task was canceled due to receiving a shutdown signal." #14948

mmacdo02-tufts · 2024-03-04T20:47:19Z

Please confirm the following

I agree to follow this project's code of conduct.
I have checked the current issues for duplicates.
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

Long running Ansible jobs are failing with no other information. We have AWX 23.8.0 installed on OpenShift 4.11.57 using the AWX-Operator. I did check the current issues for duplicates so I apologies if this is a duplicate bug.

I am able to replicate this problem in both my Lab and Production environments which run on different OpenShift clusters. Both are the same version of AWX (23.8.0) with same AWX operator (awx-operator.v2.12.0) and same version of Red Hat OpenShift 4.11.57. All long running jobs fail the same way.

kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – env | grep ANSIBLE_RUNNER_KEEPALIVE_SECOND
ANSIBLE_RUNNER_KEEPALIVE_SECONDS=30

kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – receptor --version
1.4.4+gc75b1f6

kubectl -n tts-lab-awx exec -it automation-job-1152-mvg7d – ansible-runner --version
2.3.5

I’m happy to provide more information but I am pretty new to AWX. I did increase our containerLogMaxSize to 200mb for better visibility. I also set K8S Ansible Runner Keep-Alive Message Interval to 30.

Right now I am just trying to run a simple Ansible playbook that simply pauses for 120 minutes for troubleshooting / debugging. This job will always fail.

AWX version

23.8.0

Select the relevant components

Installation method

openshift

Modifications

no

Ansible version

No response

Operating system

OpenShift 4.11.57

Web browser

Chrome

Steps to reproduce

Within AWX, the Task shows Failed: Task was canceled due to receiving a shutdown signal. I am just running a very similar Ansible playbook that pauses for 120 minutes to replicate the issue. I cannot figure out what is sending a shutdown to the automation

- name: Test long running job in AWX hosts: localhost connection: local gather_facts: no become: no tasks: - name: Pause for 120 minutes to allow testing of the executor pod pause: minutes: 120

awx-lab-task-845bbc4f89-w6wkz-awx-lab-task.log

Expected results

I expect the Ansible job to run successfully without timing out.

Actual results

Every job fails with Task was canceled due to receiving a shutdown signal.

I can see the automation-job pod terminate but I cannot figure out what is causing this pod to terminate before the Ansible job is completed.

Additional information

No response

The text was updated successfully, but these errors were encountered:

mmacdo02-tufts · 2024-03-04T20:48:09Z

I've also attached logs from awx-task pod
awx-lab-task-845bbc4f89-w6wkz-awx-lab-task.log

mmacdo02-tufts · 2024-03-04T21:55:11Z

This appears to be a duplicate of #14876

It says it's resolved in AWX 23.8.1 and Operator 2.12.1

github-actions bot added component:ui needs_triage type:bug community labels Mar 4, 2024

mmacdo02-tufts closed this as completed Mar 5, 2024

Peter1295 mentioned this issue Jun 3, 2024

The running ansible process received a shutdown signal. #15245

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWX Jobs Failing with "Task was canceled due to receiving a shutdown signal." #14948

AWX Jobs Failing with "Task was canceled due to receiving a shutdown signal." #14948

mmacdo02-tufts commented Mar 4, 2024

mmacdo02-tufts commented Mar 4, 2024

mmacdo02-tufts commented Mar 4, 2024

AWX Jobs Failing with "Task was canceled due to receiving a shutdown signal." #14948

AWX Jobs Failing with "Task was canceled due to receiving a shutdown signal." #14948

Comments

mmacdo02-tufts commented Mar 4, 2024

Please confirm the following

Bug Summary

AWX version

Select the relevant components

Installation method

Modifications

Ansible version

Operating system

Web browser

Steps to reproduce

Expected results

Actual results

Additional information

mmacdo02-tufts commented Mar 4, 2024

mmacdo02-tufts commented Mar 4, 2024