Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_for_pod_completed does not detect pod going into error state #305

Open
RobertKrawitz opened this issue Dec 9, 2021 · 1 comment
Open

Comments

@RobertKrawitz
Copy link
Member

If a pod goes into error state, wait_for_pod_completed does not detect that and hangs until timeout:

Method name: wait_for_initialized {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22 
Method name: wait_for_initialized , End time: 2021-12-09 09:19:22 , Total time: 0.12 sec
Method name: wait_for_pod_completed {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22 

But the pod has gone into an error state:

# oc logs -n benchmark-operator system-metrics-collector-2d2e4517-t6dd2
time="2021-12-09 14:22:47" level=info msg="📁 Creating indexer: elastic"
time="2021-12-09 14:24:57" level=fatal msg="ES health check failed: dial tcp XXX.XXX.XXX.XXX:YYYYY: connect: connection timed out"
@RobertKrawitz
Copy link
Member Author

The failure in this case is external (my Elasticsearch server isn't running correctly), but the pod failure should be detected and it should error out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant