Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional binascii.Error: Incorrect padding errors on jobs running on a container group (AKS) #15442

Open
5 of 11 tasks
KennethMod opened this issue Aug 14, 2024 · 0 comments
Open
5 of 11 tasks

Comments

@KennethMod
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

Job fails with the following output

{
    "status": "error",
    "job_explanation": "Failed to extract private data directory on worker.",
    "result_traceback": "Traceback (most recent call last):
  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/streaming.py\", line 183, in run
    unstream_dir(self._input, data['zipfile'], self.private_data_dir)
  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/utils/streaming.py\", line 71, in unstream_dir
    data = source.read(chunk_size)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/utils/base64io.py\", line 282, in read
    results.write(base64.b64decode(data))
                  ^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/lib64/python3.11/base64.py\", line 88, in b64decode
    return binascii.a2b_base64(s, strict_mode=validate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Incorrect padding
"
}

AWX version

24.2.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

2.15.10

Operating system

Linux 652c6ec34339 4.18.0-553.8.1.el8_10.x86_64 #1 SMP Fri Jun 14 03:19:37 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Web browser

Chrome

Steps to reproduce

We are running a basic job template, running on a container group on AKS.

    - name: Ping
      shell: echo $HOSTNAME
      changed_when: false

Every 10-20 runs a job fails with the padding error described in the summary, we can see this error in the pod logs running the custom EE

Expected results

The job should work the same as the others

Actual results

The job fails and the UI shows "Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''"

the job output shows "Receptor detail: Error with pod's stdout: write /tmp/receptor/awx-task-5f558b7586-sz78m/ze2sUz4x/stdout: file already closed"

Additional information

AWX version: 24.2.0
AWX Operator: 2.15.0
Ansible Runner: 2.3.6
EE image: 24.2.0
Kubernetes version: v1.29.4 and v1.30.2
Ansible version: 2.15.10

We have gathered the files from the receptor in the awx-ee container for a failed job (stdout file is empty)

status file:

{
  "State": 3,
  "Detail": "Error with pod's stdout: write /tmp/receptor/awx-task-6b95d77bbb-sbv76/izVSBFQ4/stdout: file already closed",
  "StdoutSize": 0,
  "WorkType": "kubernetes-incluster-auth",
  "ExtraData":
    {
      "Image": "",
      "Command": "",
      "Params": "",
      "KubeNamespace": "",
      "KubeConfig": "",
      "KubePod": "---\napiVersion: v1\nkind: Pod\nmetadata:\n  labels:\n    ansible-awx: b63415f6-956c-4a99-81af-ef6ae57c501e\n    ansible-awx-job-id: '96995'\n  name: automation-job-96995\n  namespace: awx\nspec:\n  automountServiceAccountToken: false\n  containers:\n  - args:\n    - ansible-runner\n    - worker\n    - --private-data-dir=/runner\n    image: packages.schroders.com/docker-eti/prd/aap-ee-sch:2.15\n    imagePullPolicy: IfNotPresent\n    name: worker\n    resources:\n      requests:\n        cpu: 250m\n        memory: 100Mi\n  dnsConfig:\n    searches:\n    - london.schroders.com\n    - azure.schroders.com\n    - schroders.com\n  serviceAccountName: default\n",
      "PodName": "automation-job-96995-r78qq",
    },
}

stdin file: (zip data excluded)

{
    "kwargs": {
        "ident": 96995,
        "playbook": "playbooks/canary_test/canary_test.yml",
        "inventory": "inventory/hosts",
        "passwords": {
            "Enter passphrase for .*:\\s*?$": "",
            "Bad passphrase, try again for .*:\\s*?$": "",
            "sudo password.*:\\s*?$": "",
            "SUDO password.*:\\s*?$": "",
            "su password.*:\\s*?$": "",
            "SU password.*:\\s*?$": "",
            "pbrun password.*:\\s*?$": "",
            "PBRUN password.*:\\s*?$": "",
            "pfexec password.*:\\s*?$": "",
            "PFEXEC password.*:\\s*?$": "",
            "dzdo password.*:\\s*?$": "",
            "DZDO password.*:\\s*?$": "",
            "pmrun password.*:\\s*?$": "",
            "PMRUN password.*:\\s*?$": "",
            "runas password.*:\\s*?$": "",
            "RUNAS password.*:\\s*?$": "",
            "enable password.*:\\s*?$": "",
            "ENABLE password.*:\\s*?$": "",
            "doas password.*:\\s*?$": "",
            "DOAS password.*:\\s*?$": "",
            "ksu password.*:\\s*?$": "",
            "KSU password.*:\\s*?$": "",
            "machinectl password.*:\\s*?$": "",
            "MACHINECTL password.*:\\s*?$": "",
            "sesu password.*:\\s*?$": "",
            "SESU password.*:\\s*?$": "",
            "BECOME password.*:\\s*?$": "",
            "SSH password:\\s*?$": "",
            "Password:\\s*?$": "",
            "Vault password:\\s*?$": ""
        },
        "suppress_env_files": true,
        "envvars": {
            "ANSIBLE_BASE_ALL_REST_FILTERS": "('ansible_base.rest_filters.rest_framework.type_filter_backend.TypeFilterBackend', 'ansible_base.rest_filters.rest_framework.field_lookup_backend.FieldLookupBackend', 'rest_framework.filters.SearchFilter', 'ansible_base.rest_filters.rest_framework.order_backend.OrderByBackend')",
            "ANSIBLE_BASE_AUTO_CREATE_SERIALIZER": "False",
            "ANSIBLE_BASE_CUSTOM_VIEW_PARENT": "awx.api.generics.APIView",
            "ANSIBLE_BASE_ORGANIZATION_MODEL": "main.Organization",
            "ANSIBLE_BASE_RESOURCE_CONFIG_MODULE": "awx.resource_api",
            "ANSIBLE_BASE_TEAM_MODEL": "main.Team",
            "ANSIBLE_FACT_CACHE_TIMEOUT": "0",
            "ANSIBLE_FORCE_COLOR": "True",
            "ANSIBLE_HOST_KEY_CHECKING": "False",
            "ANSIBLE_INVENTORY_UNPARSED_FAILED": "True",
            "ANSIBLE_PARAMIKO_RECORD_HOST_KEYS": "False",
            "AWX_PRIVATE_DATA_DIR": "/tmp/awx_96995_2m7p1b6h",
            "JOB_ID": "96995",
            "INVENTORY_ID": "2",
            "PROJECT_REVISION": "0a275443f93c6783c97f79923718202d7aff3a0c",
            "ANSIBLE_RETRY_FILES_ENABLED": "False",
            "MAX_EVENT_RES": "700000",
            "AWX_HOST": "https://awx-dev.schroders.com",
            "ANSIBLE_SSH_CONTROL_PATH_DIR": "/runner/cp",
            "ANSIBLE_COLLECTIONS_PATHS": "/runner/requirements_collections:~/.ansible/collections:/usr/share/ansible/collections",
            "ANSIBLE_ROLES_PATH": "/runner/requirements_roles:~/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles",
            "ANSIBLE_COLLECTIONS_PATH": "/runner/requirements_collections:~/.ansible/collections:/usr/share/ansible/collections"
        },
        "fact_cache_type": ""
    }
}
<ZIP data>
{
    "eof": true
}

We have successfully decoded and uncompressed the zip data included in the stdin file

raw logs from pod running job

2024-08-12T12:14:11.409468343Z stdout F {"status": "error", "job_explanation": "Failed to extract private data directory on worker.", "result_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/streaming.py\", line 183, in run\n    unstream_dir(self._input, data['zipfile'], self.private_data_dir)\n  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/utils/streaming.py\", line 71, in unstream_dir\n    data = source.read(chunk_size)\n           ^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/ansible_runner/utils/base64io.py\", line 282, in read\n    results.write(base64.b64decode(data))\n                  ^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/lib64/python3.11/base64.py\", line 88, in b64decode\n    return binascii.a2b_base64(s, strict_mode=validate)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nbinascii.Error: Incorrect padding\n"}

awx-task pod logs around the time the job fails

awx-rsyslog  awx-task-6b95d77bbb-sbv76awx-task2024-08-12 12:11:33,252 INFO supervisord started with pid 7
  awx-task-6b95d77bbb-sbv76awx-ee(changed: True)
  awx-task-6b95d77bbb-sbv76awx-rsyslogERROR 2024/08/12 12:14:11 Error reading stdin: %!!(MISSING)s(<nil>)
  awx-task-6b95d77bbb-sbv76awx-task2024-08-12 12:11:34,255 INFO spawned: 'superwatcher' with pid 28
  awx-task-6b95d77bbb-sbv76awx-ee2024-08-12 12:11:39,579 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
  awx-task-6b95d77bbb-sbv76awx-rsyslogE0812 12:14:11.400577      14 v2.go:104] write tcp 100.65.6.186:42510->100.64.0.1:443: write: connection reset by peer
  awx-task-6b95d77bbb-sbv76awx-task2024-08-12 12:11:34,255 INFO spawned: 'superwatcher' with pid 28
  awx-task-6b95d77bbb-sbv76awx-ee2024-08-12 12:11:39,579 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
  awx-task-6b95d77bbb-sbv76awx-rsyslog ERROR 2024/08/12 12:14:11 Error streaming pod logs to stdout for pod awx/automation-job-96995-r78qq. Error: write /tmp/receptor/awx-task-6b95d77bbb-sbv76/izVSBFQ4/stdout: file already closed
 awx-task2024-08-12 12:11:34,258 INFO spawned: 'rsyslog-4xx-recovery' with pid 29

We tried a bunch of AAP versions and EE version, but the problem persists.
We have also tried this setting with no changes

ee_extra_env: |

  • name: RECEPTOR_KUBE_SUPPORT_RECONNECT
    value: disabled 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant