-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWX backup fails on K8s/K3s #1518
Comments
I'm encountering the same error on an EKS cluster. AWX Operator Version: 2.5.2 {
"msg": "The task includes an option with an undefined variable. The error was: 'ansible_operator_meta' is undefined. 'ansible_operator_meta' is undefined\n\nThe error appears to be in '/runner/project/playbooks/roles/backup/tasks/creation.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Patching labels to {{ kind }} kind\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n",
"_ansible_no_log": false
} |
@AlanCoding This task is a bigger issue. The latest change for the partition table... Broke the partition table (I believe creation). |
Yeah, it could be the bug that ansible/awx#14572 is trying to fix. The introduction of the bug ansible/awx@f5922f7 made it into the last release. |
i have the same issue. AWX Operator Version: 2.8.0 Fatal: [localhost] FAILED!
Message: The task includes an option with an undefined variable. The error was: list object has no element 0. list object has no element 0.
The error appears to be in '/opt/ansible/roles/backup/tasks/postgres.yml':
Line: 3, Column: 3, but may be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Get PostgreSQL configuration
^ here |
I've just run into this issue myself, and I'm not clear why the issues referenced in @AlanCoding 's post are relevant to it? Error I'm seeing: {
"msg": "The task includes an option with an undefined variable. The error was: 'ansible_operator_meta' is undefined. 'ansible_operator_meta' is undefined\n\nThe error appears to be in '/runner/requirements_roles/srg_awx_backup/tasks/creation.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Patching labels to {{ kind }} kind\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n",
"_ansible_no_log": false
} |
I am happy to share that the backup with AWXBackup works for me. Not sure when a patch has been released or what else, but the backup part works. Haven't tried a restore yet. |
Perhaps a good time to try again? @godeater |
And I have to mention that backups don't work anymore... just like #879 (comment) ---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: <name>
namespace: <namespace>
spec:
deployment_name: <deploymentname>
backup_storage_class: "<storageclass>"
backup_storage_requirements: "1Gi"
backup_pvc_namespace: "<namespace>"
image_pull_policy: "IfNotPresent"
clean_backup_on_delete: false
no_log: true With a container connected to the backup pvc, after a backup is "created", I see a lot of Even with
|
Have you had a look at #1908 ? It seems the backup object itself doesn't log anything - but I found that as soon as you apply the manifest for the backup to your cluster, the logs from the awx-operator do show what's going on - and in my case at least (as above), it's because pg_dump is segfaulting. |
I see, also seeing the same errors in my operator as described in #1908 . |
The use of the custom issue indeed results in a successful backup. But I see this rather as a workaround then a fix. Also the documentation lacks awxbackup and it's possible options? https://ansible.readthedocs.io/projects/awx/en/latest/search.html?q=backup&check_keywords=yes&area=default |
I agree it's a workaround, and that the docs could be better (I had to dig into the source to find that you could override the image with those options.). Unfortunately it doesn't seem like anyone from the project is paying attention to this issue (and fair enough, it's open source, not paid work, they can choose to do what they like) - so we are where we are. ¯_(ツ)_/¯ |
Ran into this today with Operator 2.16.1. Same "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'backupClaim'. 'dict object' has no attribute 'backupClaim'" So it seems like backup and restore is just fully broken for AWX? Has anyone found a good fallback plan for resilience? Make a backup of your DB and hope that you can piece it all together in the event of your data being lost? EDIT: I'm seeing "workaround" mentioned in the comment above this one, and also in #1902 but this one also refers to #1908 and that one was closed as a dupe of #1895 . Could someone who understands the issue a little better post a concise summary of the workaround steps in one place? |
I have come to the conclusion that the It also lacks the option to directly write the backup to another storage backend, like S3 (e.g. AWS or MinIO). Currently I have created these resources to automate the backup process, all from within a K8s cluster. All resources
Cronjob create awxbackup
Cronjob S3 upload
Cronjob cleanup awxbackup
Check backups
Secret
RBACapiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: awx-backup-role
namespace: awx # Replace with your namespace
rules:
- apiGroups:
- awx.ansible.com
resources:
- awxbackups
verbs:
- get
- create
- list
- watch
- delete apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: awx-backup-rolebinding
namespace: awx # Replace with your namespace
subjects:
- kind: ServiceAccount
name: awx-backup-sa
namespace: awx # Replace with your namespace
roleRef:
kind: Role
name: awx-backup-role
apiGroup: rbac.authorization.k8s.io apiVersion: v1
kind: ServiceAccount
metadata:
name: awx-backup-sa
namespace: awx # Replace with your namespace CronjobsCronjob create awxbackupapiVersion: batch/v1
kind: CronJob
metadata:
name: create-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "45 2 * * *" # Runs daily at 2:45 AM (UTC)
jobTemplate:
spec:
template:
spec:
serviceAccountName: awx-backup-sa
containers:
- name: create-awx-backup
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
cat <<EOF | kubectl apply -f -
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awxbackup-$(date +'%Y-%m-%d-%H-%M-%S')
namespace: awx
spec:
deployment_name: awx
backup_storage_class: "<storageclass>"
_postgres_image: docker.io/postgres
_postgres_image_version: 15-alpine
backup_storage_requirements: "1Gi"
backup_pvc_namespace: "awx"
image_pull_policy: "IfNotPresent"
clean_backup_on_delete: false # Leave false, only deletes pvc when resource AWXBackup is deleted
no_log: true
EOF
restartPolicy: OnFailure Cronjob S3 uploadapiVersion: batch/v1
kind: CronJob
metadata:
name: s3-upload-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "0 3 * * *" # Runs daily at 3:00 AM (UTC)
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-container
image: amazon/aws-cli
envFrom:
- secretRef:
name: s3-credentials-awx-backup
command:
- /bin/bash
- -c
- |
aws s3 cp /backupdata s3://<bucket>>/ --recursive --endpoint-url https://subdomain.domain.tld:api_port
# Define the directory
DIR="/backupdata"
# Find the latest directory and store its name
LATEST_DIR=$(ls -td ${DIR}/*/ | head -n 1)
# Remove trailing slash from directory name
LATEST_DIR=${LATEST_DIR%/}
echo "Latest dir:" $LATEST_DIR
# Delete all directories except the latest one
find ${DIR} -maxdepth 1 -type d ! -path "${LATEST_DIR}" ! -path "${DIR}" -exec rm -rf {} +
volumeMounts:
- name: data-volume
mountPath: /backupdata
restartPolicy: OnFailure
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: awx-backup-claim Cronjob cleanup awxbackupapiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "30 3 * * *" # Runs daily at 3:30 AM (UTC)
jobTemplate:
spec:
template:
spec:
serviceAccountName: awx-backup-sa
containers:
- name: cleanup-backups
image: bitnami/kubectl:latest # A lightweight image with kubectl installed
command:
- /bin/bash
- -c
- |
namespace="awx" # Replace with your namespace
date_limit=$(date -d '7 days ago' --utc +'%Y-%m-%dT%H:%M:%SZ')
echo "Date limit: $date_limit"
# List all AWXBackup resources and their creation timestamps
backups=$(kubectl get awxbackups -n "$namespace" -o jsonpath='{.items[*].metadata.name}')
echo "All backups:" $backups
# Loop through backups and delete those older than the date limit
for backup in $backups; do
# Get the creation timestamp of the backup
creation_time=$(kubectl get awxbackup "$backup" -n "$namespace" -o jsonpath='{.metadata.creationTimestamp}')
# Compare the creation timestamp with the date limit
if [[ "$creation_time" < "$date_limit" ]]; then
kubectl delete awxbackup "$backup" -n "$namespace"
echo "Deleted backup: $backup"
fi
done
restartPolicy: OnFailure SecretapiVersion: v1
kind: Secret
metadata:
name: s3-credentials-awx-backup
namespace: awx # Replace with your namespace
type: Opaque
data:
AWS_ACCESS_KEY_ID: <base64 access_key>
AWS_SECRET_ACCESS_KEY: <base64 secret_access_key>
AWS_DEFAULT_REGION: <base64 region>
AWS_ENDPOINT_URL: <base64 https://subdomain.domain.tld:api_port> Check backupsapiVersion: v1
kind: Pod
metadata:
name: busybox-pod
namespace: awx # Replace with your namespace
spec:
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: awx-backup-claim
containers:
- name: busybox-container
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- name: data-volume
mountPath: /backupdata
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 100m
memory: 100Mi kubectl exec -it busybox-pod -n awx -- sh
cd /backupdata
ls |
Wow, thanks for putting all that together. I was coming to a similar conclusion myself; I've ended up writing a bunch of bash/curl commands to export most of what I need via the API and then documented some of the other details in an internal company wiki. I agree with your key point that it seems like a proper AWX backup needs to be easily moved to a different location, outside the cluster. That, plus struggles to get the backup and restore roles working fully, was one of my key reasons for putting in the work to script out the export/import via scripts. |
Another thing possible is to write every AWX component in Ansible playbooks. I've done that too, so everything is documented and stateful. A native Ansible approach compared to bash scripts. https://docs.ansible.com/ansible/latest/collections/awx/awx/index.html |
Please confirm the following
Bug Summary
On both k8s and k3s, embedded and external Postgresql DB the AWX backup fails with the exact same error:
AWX Operator version
2.4.0 - 2.6.0
AWX version
AWX 22.5.0 - 23.2.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
1.27.4 k8s/k3s
Modifications
no
Steps to reproduce
helm install awx-operator awx-operator/awx-operator -n awx --create-namespace
Expected results
Succesful backup
Actual results
The error was: list object has no element 0.
When performing only the 2 k8sclusterinfo tasks locally in ansible., one for setting the fact this__awx and the other one for pg_config, it works fine.
When running this playbook inside the awx operator container, it says
ansible_operator_meta is undefined
.Additional information
Operator Logs
The text was updated successfully, but these errors were encountered: