Health check not running for ON_DEMAND task #1964

bmerry · 2019-06-21T15:06:15Z

I've just started experimenting with Singularity, so apologies if I've just misunderstood how it all works.

I've created a deploy for an ON_DEMAND request with the following health check fields:

"deployHealthTimeoutSeconds": 60,
"healthcheckUri": "/health",
"healthcheckPortIndex": 1,
"healthcheckMaxTotalTimeoutSeconds": 60,

After creating a run I can see the task in the UI, where the health check section says

Beginning when Task enters running, wait a max of 45s for app to start responding, then hit /health with a 5 second timeout every 5 second(s) until: HTTP 200 is recieved

followed by a dashed box with the text "No healthchecks". The HTTP access logs for the task don't show any hits on the /health endpoint. When querying /api/tasks/ids/request/REQUEST_NAME the task shows up in notYetHealthy. After 10 minutes it's killed with the message "OVERDUE_NEW_TASK - Task did not become healthy after 10:00.000".

If I click on the "/health" link in the UI it shows a correct health page, which gives me some confidence that I've got the port mapping set up right.

I'm using a local docker-compose setup for testing, with the following images:

hubspot/singularityservice:0.22.0
mesosphere/mesos-slave:1.5.0
mesosphere/mesos-master:1.5.0
netflixoss/exhibitor:1.5.2 (for Zookeeper)

I'm using the Docker containerizer with BRIDGE networking and not using the Singularity executor, in case that makes a difference.

The text was updated successfully, but these errors were encountered:

ssalinas · 2019-07-17T14:07:04Z

For ON_DEMAND tasks we don't actually run health checks, as it doesn't really have any bearing on a oneoff tasks. I realize the UI is likely confusing here and that's something we can fix (the backend currently doesn't stop you from specifying those options even if they aren't being used). Heathchecks are only run for worker/service types, where we would need to know if something is healthy. e.g. ensure replacement instance is healthy before shutting an old one down

bmerry · 2019-07-17T14:29:32Z

Ok, I can see the argument for not running the health check on ON_DEMAND tasks. I'm using Singularity in a slightly odd way, which is why I trying to define a health check, but I've got alternative tools I can use to monitor health.

Perhaps the API should prevent the checks being defined in the first place, to stop people like me from shooting themselves in the foot? Or perhaps they should be fully ignored, so that the task doesn't get killed 10 minutes later due to not having become healthy?

ssalinas · 2019-07-17T14:38:32Z

Oh, read over the fact that it got killed after 10 mins. Will have to take a closer look at that

ssalinas · 2019-07-17T14:39:41Z

For the moment though I'd recommend what you said about doing health monitoring in a different way. As an aside, what type of use case do you have for an on demand with health checks? Seems to me that anything long running with health checks should be a worker/service instead anyways

bmerry · 2019-07-17T14:57:43Z

It's part of the software for a large radio telescope. Each observation is managed by one of these jobs, which typically last for a few hours to a day. If one fails, it shouldn't be automatically restarted because higher-level systems have to deal with the failure and rescheduling, which is why I didn't use a worker/service.

In theory it could probably persist state and pick up the pieces if it died and was automatically restarted, but it's not been a priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health check not running for ON_DEMAND task #1964

Health check not running for ON_DEMAND task #1964

bmerry commented Jun 21, 2019

ssalinas commented Jul 17, 2019

bmerry commented Jul 17, 2019

ssalinas commented Jul 17, 2019

ssalinas commented Jul 17, 2019

bmerry commented Jul 17, 2019

Health check not running for ON_DEMAND task #1964

Health check not running for ON_DEMAND task #1964

Comments

bmerry commented Jun 21, 2019

ssalinas commented Jul 17, 2019

bmerry commented Jul 17, 2019

ssalinas commented Jul 17, 2019

ssalinas commented Jul 17, 2019

bmerry commented Jul 17, 2019