Skip to content

Commit

Permalink
Fix: Stability issues when checking crash handling in monitored endpoint
Browse files Browse the repository at this point in the history
# Problem:
The endpoint '/status/check/fastapi' is monitored by multiple operators
and therefore called very frequently. The crash handling check results
in a new virtual machine being started at every call, and memory leaks
(likely pipes accumulating) can cause the supervisor to crash after
some time with 'Too many open files'.

# Solution:
In the short term, avoid calling the crash check in the monitored endpoint.
In the longer term, investigate and fix the accumulating file descriptors.
  • Loading branch information
hoh committed Jul 14, 2022
1 parent 207cb52 commit 4ff685c
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion vm_supervisor/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,6 @@ async def status_check_fastapi(request: web.Request):
"cache": await status.check_cache(session),
"persistent_storage": await status.check_persistent_storage(session),
"error_handling": await status.check_error_raised(session),
"crash_handling": await status.check_crash_and_restart(session),
}
return web.json_response(result, status=200 if all(result.values()) else 503)

Expand Down

0 comments on commit 4ff685c

Please sign in to comment.