Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database disconnection is not recovered well by deployment/benchmark #408

Open
gusmith opened this issue Aug 23, 2019 · 1 comment
Open

Comments

@gusmith
Copy link
Contributor

gusmith commented Aug 23, 2019

This issue is opened from observations when running 100 times 1M*1M linkage on a deployed service on k8s.
At one point, the database fails and restarts (see issue #407 ).

From his point, the whole deployment melt-down: the workers restarts a number of times (I cannot see the source log with their failure), the flask pod restarts.
Flask seems to be OK when the database is not available (raising a lot of exception, but that sounds expected). But at the end and without clear notice:

[2019-08-22 18:54:22 +0000] [1] [INFO] Handling signal: term
[2019-08-22 18:54:22 +0000] [10] [INFO] Worker exiting (pid: 10)
[2019-08-22 18:54:22 +0000] [11] [INFO] Worker exiting (pid: 11)
[2019-08-22 18:54:22 +0000] [12] [INFO] Worker exiting (pid: 12)
@hardbyte
Copy link
Collaborator

Might be useful to look at automated retrying with an exponential backoff before failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants