-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start: allow users to call job start command to start stopped jobs #24150
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @martisah. I've left some comments on implementation details but the overall design is solid.
- Don't forget to run
make cl
to add a changelog item. - Given how tricky the reverse iteration and selection of versions is, it'd probably be a good idea to have a test that covers the more complex selection scenarios like having job that's stopped and started multiple times.
- It looks like GitHub was "helpful" and hid some of the comments, be sure to expand them as you work thru the review.
var chosenVersion uint64 | ||
versionAvailable := false | ||
for i := range versions { | ||
if !*versions[i].Stop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of some bug fixing associated to selecting the correct version, I realized that we should actually be selecting the previously non-stopped version. Also I re-evaluated whether we should be specifically looking for a running version, as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?
Oh that's a good call. Yeah we shouldn't revert to pending versions because we don't know that they're stable. That'll complicate testing a bit because you'll need to make sure the job can mark itself running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of a method I can use to ensure the jobs are running in my tests? I've been trying to wait for the evaluation to succeed after running each command, but that still hasn't proved to work in terms of getting them to be marked running in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job should be marked running as soon as at lease one allocation is placed. So as long as the allocations can start you should be able to poll api/Jobs.Info
for the running
status. But if the allocations exit then the job will be dead, so you need to make sure they stay running or they could exit before you can poll and you'll get weird test flakes.
The testJob
uses the mock driver, so you may want to look at the configuration of the test job to make sure it'll still be up and running.
Start an existing stopped job. This command is used to start a previously stopped job's | ||
most recent version. Upon successful start, an interactive | ||
monitor session will start to display log lines as the job starts its | ||
allocations based on its most recent version. It is safe to exit the monitor | ||
early using ctrl+c. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: the output of these commands are what get printed literally to the terminal. There's no automatic reflowing in mitchellh/cli
. So we should try to manually reflow these to 80 cols.
|
||
func (c *JobStartCommand) Help() string { | ||
helpText := ` | ||
Usage: nomad job start [options] <job> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't have any clue here to the user that they can use more than one job ID, or that if they do they all need to be in the same namespace.
// Find the most recent version for this job that has not been stopped | ||
var chosenVersion uint64 | ||
versionAvailable := false | ||
for i := range versions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do for _, version := range versions
here and then not have to dereference by index.
var chosenVersion uint64 | ||
versionAvailable := false | ||
for i := range versions { | ||
if !*versions[i].Stop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?
Oh that's a good call. Yeah we shouldn't revert to pending versions because we don't know that they're stable. That'll complicate testing a bit because you'll need to make sure the job can mark itself running.
} | ||
|
||
} | ||
c.versionSelected = chosenVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially have multiple jobs we're going to revert, so we'd be overwriting this field from multiple goroutines, which is a race condition. The production code doesn't ever read this value, only tests, so we probably want to get rid of it if we can.
I guess there's no way in the API to tell after the fact which version we reverted?
One way to solve this would be to refactor the body of the goroutine into a method that returns the chosen version, and then have the tests call that method directly. But for this it'd be nice to have the tests exercise the goroutines. Maybe this could be a buffered channel that the goroutine writes into and then the test can pull out the value? That'd be safe from data races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh I see, I'll give that a shot!
if consulToken == "" { | ||
consulToken = os.Getenv("CONSUL_HTTP_TOKEN") | ||
} | ||
|
||
if vaultToken == "" { | ||
vaultToken = os.Getenv("VAULT_TOKEN") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: this is not going to change between jobs, so we can lift these two assignments above the loop.
must.Sprintf("job start stdout: %s", ui.OutputWriter.String()), | ||
must.Sprintf("job start stderr: %s", ui.ErrorWriter.String()), | ||
) | ||
must.Eq(t, expectedVersions[i], startCmd.versionSelected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you've still got a test failure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I keep passing this test when I run it locally, but it seems to keep failing when I push, could this be because of race conditions I haven't accounted for (similarly to what you mentioned above)?
This PR implements the
job start
command, which allows for users to call "job start [options] " with a valid job that has been previously stopped and starts the most recent running and stable version up.Fixes: #23852