-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad UI shows failed for jobs that are scaled to 0 #23591
Comments
@caiodelgadonew in recent versions of the UI we're presenting a sort of "aggregate state" based on the job status and the allocation status. Can you verify that the API is returning |
@tgross no, its not returning running, its returning dead. To reproduce, deploy any job, and then scale the groups to 0 $ nomad job scale testing-scheduler app 0
$ nomad job scale testing-scheduler app2 0 |
Ok, thanks. I'd expect a job with all complete allocations to be "dead". We made a lot of changes to provide reasonable "aggregate" statuses in 1.8, but there are an awful lot of corner cases 😁 I'll mark this for roadmapping. |
thanks @tgross , you mean |
Actually, no. 😀 The job status is a pretty coarse view of the world and so "dead" just means all allocations are terminal. I could definitely see an argument that this view of the world isn't ideal because it makes the UI design harder, but it's definitely the intended behavior right now. But the UI does want to present a richer "aggregate state" that would account for this case where the job is scaled down. |
I've got a note to make this experience better in the UI. The common way this shows itself is when allocations complete and are garbage-collected, but garbage-collected allocations don't leave the UI with a lot of clues as to whether their absence is a bad thing ( In this case, a deliberate scale to 0 seems more detectable. I'll see what I can do to give a status less off-putting than |
Any updates on this @philrenaud ? :) |
I'am also playing around with job scaling to implement a scale-to-zero-strategy and came accross this issue. |
Hi @caiodelgadonew, I've started thinking about a solution to this problem. The current logic for status has something like
Which generally works pretty well, except here, where As such, in #23829 , I've started putting in a safety valve for exactly this case:
This'll show something like this (3rd job shown): Is this about what you had in mind? ======================== A secondary concern (cc @sevensolutions as well) is that these jobs are nevertheless considered This means that these statuses are at most only temporary: "Scaled Down" is what you'd see until garbage collection takes place / a user runs We have been exploring some concepts that would create garbage-collection-avoidance permanence (see Golden Job Versions) for example) that might mitigate this in the future, but I wanted to open a dialogue here to indicate the temporary nature of this status as implemented in #23829 |
@philrenaud That "Scaled Down" looks amazing! |
@caiodelgadonew i also thought about a different color first, but i think "Scaled down" doesn't neccesarily need to be a warning. It may be intended. |
Yeah, you're correct, I like your idea. :) Any plans for shipping it? |
Yep, let me test some edge cases and get it tagged for the next minor release. Thanks for your patience with this! |
Amazing @philrenaud many thanks!!! |
Hello,
We have this behavior on nomad when we run
nomad job scale <job> <group> 0
that the job shows as failed.It would be nice to have a better message for this state instead of Failed, I know the job is not healthy since the service checks cant pass cuz its scaled to zero, but maybe a flag "Downscaled" would be a good thing to have.
Is this on track? are there any plans for it?
Example situation:
Expected:
or
The text was updated successfully, but these errors were encountered: