start: allow users to call job start command to start stopped jobs #24150

martisah · 2024-10-08T17:47:46Z

This PR implements the job start command, which allows for users to call "job start [options] " with a valid job that has been previously stopped and starts the most recent running and stable version up.

Fixes: #23852

… stopped job

command/job_start.go

tgross

This is looking great @martisah. I've left some comments on implementation details but the overall design is solid.

Don't forget to run make cl to add a changelog item.
Given how tricky the reverse iteration and selection of versions is, it'd probably be a good idea to have a test that covers the more complex selection scenarios like having job that's stopped and started multiple times.
It looks like GitHub was "helpful" and hid some of the comments, be sure to expand them as you work thru the review.

command/commands.go

command/job_start_test.go

command/job_start.go

martisah · 2024-11-07T18:13:53Z

command/job_start.go

+			var chosenVersion uint64
+			versionAvailable := false
+			for i := range versions {
+				if !*versions[i].Stop {


In light of some bug fixing associated to selecting the correct version, I realized that we should actually be selecting the previously non-stopped version. Also I re-evaluated whether we should be specifically looking for a running version, as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?

as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?

Oh that's a good call. Yeah we shouldn't revert to pending versions because we don't know that they're stable. That'll complicate testing a bit because you'll need to make sure the job can mark itself running.

Do you know of a method I can use to ensure the jobs are running in my tests? I've been trying to wait for the evaluation to succeed after running each command, but that still hasn't proved to work in terms of getting them to be marked running in tests.

The job should be marked running as soon as at lease one allocation is placed. So as long as the allocations can start you should be able to poll api/Jobs.Info for the running status. But if the allocations exit then the job will be dead, so you need to make sure they stay running or they could exit before you can poll and you'll get weird test flakes.

The testJob uses the mock driver, so you may want to look at the configuration of the test job to make sure it'll still be up and running.

tgross · 2024-11-07T21:14:59Z

command/job_start.go

+  Start an existing stopped job. This command is used to start a previously stopped job's
+  most recent version. Upon successful start, an interactive
+  monitor session will start to display log lines as the job starts its
+  allocations based on its most recent version. It is safe to exit the monitor
+  early using ctrl+c.


Nitpick: the output of these commands are what get printed literally to the terminal. There's no automatic reflowing in mitchellh/cli. So we should try to manually reflow these to 80 cols.

tgross · 2024-11-07T21:17:24Z

command/job_start.go

+
+func (c *JobStartCommand) Help() string {
+	helpText := `
+Usage: nomad job start [options] <job>


It looks like we don't have any clue here to the user that they can use more than one job ID, or that if they do they all need to be in the same namespace.

tgross · 2024-11-07T21:18:53Z

command/job_start.go

+			// Find the most recent version for this job that has not been stopped
+			var chosenVersion uint64
+			versionAvailable := false
+			for i := range versions {


You can do for _, version := range versions here and then not have to dereference by index.

tgross · 2024-11-07T21:20:05Z

command/job_start.go

+			var chosenVersion uint64
+			versionAvailable := false
+			for i := range versions {
+				if !*versions[i].Stop {


as often the previously non stopped version tends to be pending (in testing). Is it the expected behavior to always revert to only running versions?

Oh that's a good call. Yeah we shouldn't revert to pending versions because we don't know that they're stable. That'll complicate testing a bit because you'll need to make sure the job can mark itself running.

tgross · 2024-11-07T21:22:03Z

command/job_start.go

+				}
+
+			}
+			c.versionSelected = chosenVersion


We could potentially have multiple jobs we're going to revert, so we'd be overwriting this field from multiple goroutines, which is a race condition. The production code doesn't ever read this value, only tests, so we probably want to get rid of it if we can.

I guess there's no way in the API to tell after the fact which version we reverted?

One way to solve this would be to refactor the body of the goroutine into a method that returns the chosen version, and then have the tests call that method directly. But for this it'd be nice to have the tests exercise the goroutines. Maybe this could be a buffered channel that the goroutine writes into and then the test can pull out the value? That'd be safe from data races.

Ooh I see, I'll give that a shot!

tgross · 2024-11-07T21:23:22Z

command/job_start.go

+			if consulToken == "" {
+				consulToken = os.Getenv("CONSUL_HTTP_TOKEN")
+			}
+
+			if vaultToken == "" {
+				vaultToken = os.Getenv("VAULT_TOKEN")
+			}


Nitpick: this is not going to change between jobs, so we can lift these two assignments above the loop.

tgross · 2024-11-07T21:25:14Z

command/job_start_test.go

+			must.Sprintf("job start stdout: %s", ui.OutputWriter.String()),
+			must.Sprintf("job start stderr: %s", ui.ErrorWriter.String()),
+		)
+		must.Eq(t, expectedVersions[i], startCmd.versionSelected)


Looks like you've still got a test failure here.

Ah I see, I keep passing this test when I run it locally, but it seems to keep failing when I push, could this be because of race conditions I haven't accounted for (similarly to what you mentioned above)?

martisah added 2 commits October 8, 2024 14:15

start: allow users to call job start command to start up a previously…

bb347a1

… stopped job

quick clean up

a5e5c66

martisah self-assigned this Oct 8, 2024

clean up and remove stable condition

a36a5d8

vercel bot deployed to Preview – nomad-ui October 24, 2024 15:02 View deployment

copyright headers

0d0740f

vercel bot deployed to Preview – nomad-ui October 24, 2024 15:05 View deployment

martisah commented Oct 24, 2024

View reviewed changes

command/job_start.go Outdated Show resolved Hide resolved

martisah marked this pull request as ready for review October 24, 2024 16:17

Merge branch 'main' into job-start

6d43620

vercel bot deployed to Preview – nomad-ui October 24, 2024 16:40 View deployment

tidy conditions

f2da6d4

vercel bot deployed to Preview – nomad-ui October 24, 2024 16:45 View deployment

tgross self-requested a review October 24, 2024 17:37

tgross reviewed Oct 24, 2024

View reviewed changes

tgross mentioned this pull request Oct 30, 2024

Implement nomad start command #18558

Closed

test versions, add changelog

35e15db

vercel bot deployed to Preview – nomad-ui October 31, 2024 17:12 View deployment

selecting version fixes

8f0e67b

vercel bot deployed to Preview – nomad-ui November 7, 2024 18:07 View deployment

doc fix

150ad92

vercel bot deployed to Preview – nomad-ui November 7, 2024 18:10 View deployment

martisah commented Nov 7, 2024

View reviewed changes

clean up

5126d67

vercel bot deployed to Preview – nomad-ui November 7, 2024 18:20 View deployment

martisah requested a review from tgross November 7, 2024 19:28

tgross reviewed Nov 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

start: allow users to call job start command to start stopped jobs #24150

start: allow users to call job start command to start stopped jobs #24150

martisah commented Oct 8, 2024 •

edited by tgross

Loading

tgross left a comment •

edited

Loading

martisah Nov 7, 2024 •

edited

Loading

tgross Nov 7, 2024

martisah Nov 12, 2024

tgross Nov 12, 2024

tgross Nov 7, 2024

tgross Nov 7, 2024

tgross Nov 7, 2024

tgross Nov 7, 2024

tgross Nov 7, 2024

martisah Nov 12, 2024

tgross Nov 7, 2024

tgross Nov 7, 2024

martisah Nov 12, 2024

start: allow users to call job start command to start stopped jobs #24150

Are you sure you want to change the base?

start: allow users to call job start command to start stopped jobs #24150

Conversation

martisah commented Oct 8, 2024 • edited by tgross Loading

tgross left a comment • edited Loading

Choose a reason for hiding this comment

martisah Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martisah commented Oct 8, 2024 •

edited by tgross

Loading

tgross left a comment •

edited

Loading

martisah Nov 7, 2024 •

edited

Loading