-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check not running for ON_DEMAND task #1964
Comments
For ON_DEMAND tasks we don't actually run health checks, as it doesn't really have any bearing on a oneoff tasks. I realize the UI is likely confusing here and that's something we can fix (the backend currently doesn't stop you from specifying those options even if they aren't being used). Heathchecks are only run for worker/service types, where we would need to know if something is healthy. e.g. ensure replacement instance is healthy before shutting an old one down |
Ok, I can see the argument for not running the health check on ON_DEMAND tasks. I'm using Singularity in a slightly odd way, which is why I trying to define a health check, but I've got alternative tools I can use to monitor health. Perhaps the API should prevent the checks being defined in the first place, to stop people like me from shooting themselves in the foot? Or perhaps they should be fully ignored, so that the task doesn't get killed 10 minutes later due to not having become healthy? |
Oh, read over the fact that it got killed after 10 mins. Will have to take a closer look at that |
For the moment though I'd recommend what you said about doing health monitoring in a different way. As an aside, what type of use case do you have for an on demand with health checks? Seems to me that anything long running with health checks should be a worker/service instead anyways |
It's part of the software for a large radio telescope. Each observation is managed by one of these jobs, which typically last for a few hours to a day. If one fails, it shouldn't be automatically restarted because higher-level systems have to deal with the failure and rescheduling, which is why I didn't use a worker/service. In theory it could probably persist state and pick up the pieces if it died and was automatically restarted, but it's not been a priority. |
I've just started experimenting with Singularity, so apologies if I've just misunderstood how it all works.
I've created a deploy for an ON_DEMAND request with the following health check fields:
After creating a run I can see the task in the UI, where the health check section says
followed by a dashed box with the text "No healthchecks". The HTTP access logs for the task don't show any hits on the /health endpoint. When querying /api/tasks/ids/request/REQUEST_NAME the task shows up in notYetHealthy. After 10 minutes it's killed with the message "OVERDUE_NEW_TASK - Task did not become healthy after 10:00.000".
If I click on the "/health" link in the UI it shows a correct health page, which gives me some confidence that I've got the port mapping set up right.
I'm using a local docker-compose setup for testing, with the following images:
I'm using the Docker containerizer with BRIDGE networking and not using the Singularity executor, in case that makes a difference.
The text was updated successfully, but these errors were encountered: