Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase cinder timeout to 180 sec #738

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

holser
Copy link
Contributor

@holser holser commented Nov 18, 2024

RHEV job utilizes VMs that are a bit slower. Usually the job fails as

TASK [cinder_adoption : wait for Cinder pods to start up] **********************
Wednesday 13 November 2024 20:28:00 +0000 (0:00:00.071) 0:18:59.868 ****
FAILED - RETRYING: [localhost]: wait for Cinder pods to start up (60 retries left).
...
FAILED - RETRYING: [localhost]: wait for Cinder pods to start up (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 60, "changed": true, "cmd": "set -euxo pipefail\n\nexport KUBECONFIG=~/adoption/kubeconfig\n\noc wait pod --for condition=Ready -l component=cinder-scheduler\noc wait pod --for condition=Ready -l component=cinder-api\n[ -z "" ] || oc wait pod --for condition=Ready -l component=cinder-volume\n[ -z "" ] || oc wait pod --for condition=Ready -l component=cinder-backup\n", "delta": "0:00:00.216386", "end": "2024-11-13 20:30:29.983085", "msg": "non-zero return code", "rc": 1, "start": "2024-11-13 20:30:29.766699", "stderr": "+ export KUBECONFIG=/home/stack/adoption/kubeconfig\n+ KUBECONFIG=/home/stack/adoption/kubeconfig\n+ oc wait pod --for condition=Ready -l component=cinder-scheduler\nerror: no matching resources found", "stderr_lines": ["+ export KUBECONFIG=/home/stack/adoption/kubeconfig", "+ KUBECONFIG=/home/stack/adoption/kubeconfig", "+ oc wait pod --for condition=Ready -l component=cinder-scheduler", "error: no matching resources found"], "stdout": "", "stdout_lines": []}

This patch increases timeout that should be enough in most of cases even on slow envs. This reduces the chance of failure significantly.

RHEV job utilizes VMs that are a bit slower. Usually the job fails as

TASK [cinder_adoption : wait for Cinder pods to start up] **********************
Wednesday 13 November 2024  20:28:00 +0000 (0:00:00.071)       0:18:59.868 ****
FAILED - RETRYING: [localhost]: wait for Cinder pods to start up (60 retries left).
...
FAILED - RETRYING: [localhost]: wait for Cinder pods to start up (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 60, "changed": true, "cmd": "set -euxo pipefail\n\nexport KUBECONFIG=~/adoption/kubeconfig\n\noc wait pod --for condition=Ready -l component=cinder-scheduler\noc wait pod --for condition=Ready -l component=cinder-api\n[ -z \"\" ] || oc wait pod --for condition=Ready -l component=cinder-volume\n[ -z \"\" ] || oc wait pod --for condition=Ready -l component=cinder-backup\n", "delta": "0:00:00.216386", "end": "2024-11-13 20:30:29.983085", "msg": "non-zero return code", "rc": 1, "start": "2024-11-13 20:30:29.766699", "stderr": "+ export KUBECONFIG=/home/stack/adoption/kubeconfig\n+ KUBECONFIG=/home/stack/adoption/kubeconfig\n+ oc wait pod --for condition=Ready -l component=cinder-scheduler\nerror: no matching resources found", "stderr_lines": ["+ export KUBECONFIG=/home/stack/adoption/kubeconfig", "+ KUBECONFIG=/home/stack/adoption/kubeconfig", "+ oc wait pod --for condition=Ready -l component=cinder-scheduler", "error: no matching resources found"], "stdout": "", "stdout_lines": []}

This patch increases timeout that should be enough in most of cases even
on slow envs. This reduces the chance of failure significantly.
@rajathere
Copy link
Contributor

I'm skeptical about this change since we are increasing the timeout for cinder-api pod whereas the error occurred was for the cinder-scheduler pod

oc wait pod --for condition=Ready -l component=cinder-scheduler", "error: no matching resources found"

Also the cinder-api pod takes ~10 retries to come up which should be sufficient with old code.

TASK [cinder_adoption : wait for Cinder API to start up] ***********************
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (90 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (89 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (88 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (87 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (86 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (85 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (84 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (83 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (82 retries left).
FAILED - RETRYING: [localhost]: wait for Cinder API to start up (81 retries left).
changed: [localhost] => {"attempts": 11, "changed": true, "cmd": "set -euxo pipefail\n\n\noc wait pod --for condition=Ready -l component=cinder-api\n", "delta": "0:00:11.874527", "end": "2024-11-18 10:10:46.131334", "msg": "", "rc": 0, "start": "2024-11-18 10:10:34.256807", "stderr": "+ oc wait pod --for condition=Ready -l component=cinder-api", "stderr_lines": ["+ oc wait pod --for condition=Ready -l component=cinder-api"], "stdout": "pod/cinder-api-0 condition met", "stdout_lines": ["pod/cinder-api-0 condition met"]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants