Deployment workflow improvements #6882

dnr · 2024-11-23T07:47:18Z

What changed?

Start series workflow only once instead of trying on every update
Use validator to reject update if already present
Don't update local state if sync to user data fails
Extract anonymous functions to methods
Fix escaping

Why?

Reduce overhead, avoid unnecessary calls

How did you test it?

existing tests

…f sync fails

ShahabT · 2024-11-24T02:34:31Z

service/worker/deployment/deployment_workflow.go

@@ -191,7 +151,70 @@ func (d *DeploymentWorkflowRunner) run() error {

 	d.logger.Debug("Deployment doing continue-as-new")
 	return workflow.NewContinueAsNewError(d.ctx, DeploymentWorkflow, d.DeploymentWorkflowArgs)
+}
+
+func (d *DeploymentWorkflowRunner) validateRegisterWorker(args *deploymentspb.RegisterWorkerInDeploymentArgs) error {


just pasting this here since this is something I learned and might help somebody - I was curious on understanding what would happen if the validator were to be called before the function conducting the update was - to the naked eye, this would panic since there could be a chance args.TaskQueue may not have a value present in the map

turns out all panics from validator function will be returned as errors which is sweet since nothing shall break

The validator is called before the update, that's the whole point.

If args.TaskQueueName isn't in the map, the lookup will return nil. GetTaskQueues() on nil returns nil, and a lookup on that nil map will return not found. It will not panic.

lookup on that nil map will return not found.

I thought over here, we would receive a panic since we could be trying to do a nil lookup on a nil map right - my apologies if my assumption was wrong...

ShahabT · 2024-11-24T02:36:44Z

service/worker/deployment/deployment_workflow.go

+}
+
+func (d *DeploymentWorkflowRunner) handleRegisterWorker(ctx workflow.Context, args *deploymentspb.RegisterWorkerInDeploymentArgs) error {
+	// Note: use ctx in here (provided by update) instead of d.ctx


For my understanding, what would be the difference?

I don't know! The update handler is passed a context but I don't know if it's different from the context passed to the runner. Probably not but it seems better to use it until I confirm it's okay

I'm curious too - although I do think it's d.ctx being passed

Confirmed with sdk team that update handlers must used the Context passed to them, not the Context of the top-level workflow. This is a pretty good argument for not putting the Context in the struct actually

ShahabT · 2024-11-24T20:24:54Z

service/worker/deployment/deployment_workflow.go

+	// wait until series workflow started
+	err = workflow.Await(ctx, func() bool { return d.DeploymentLocalState.StartedSeriesWorkflow })
+	if err != nil {
+		d.logger.Error("Update canceled before series workflow started")


So if the series fails to create because say it exceeds a limit on the number of series, wf will return on line 126 and in here we'd return this error to matching? Not for now, but would it be possible to propagate the "series count limit exceeded" error to matching and eventually pollers somehow?

I don't know how we're implementing those limits.. StartWorkflowExecution or SignalWithStart couldn't fail for some application-defined limit, we'd have to change it to update-with-start? In any case, yeah I'd expect this to change when we add limits

@ShahabT - i might be wrong but the client kickstarts an update-with-start which shall, thanks to David's changes, kickstart a series workflow. If we enforce a limit check and return an error in the series workflow definition (per namespace limit), we reach line 126 from which the whole multi-operation should fail - in that case, the update should not even be executed right? moreover, that error shall be propagated to pollers too

I guess my question is if matching would receive a specific error message such as "series count limit exceeded" instead of a generic message such as "Update canceled before series workflow started" or any other generic error may be sent by server because of the multi-operation failed.

There is no multi operation at this level... matching -> deployment wf is an update-with-start, but deployment wf -> deployment series wf is currently just a start. if we want to return an error, that has to change to update-with-start also.

Also, if this deployment wf exceeds some limit, what happens to the wf? Does it stick around to tell matching "too many"? If not, then matching will try to re-create it every time, which is bad. But if it does stick around, that's using resources that shouldn't be used. Seems like we need a cache somewhere?

I was thinking matching would remember in memory and not retry registration, at least until a few minutes.

…workflow

## What changed? - Start series workflow only once instead of trying on every update - Use validator to reject update if already present - Don't update local state if sync to user data fails - Extract anonymous functions to methods - Fix escaping ## Why? Reduce overhead, avoid unnecessary calls ## How did you test it? existing tests

dnr added 4 commits November 22, 2024 23:24

start series workflow only once

b09a0df

add validator to reject if already present, also don't update state i…

09280bf

…f sync fails

named function

cf75867

fix escaping

c050e5e

dnr requested a review from a team as a code owner November 23, 2024 07:47

ShahabT approved these changes Nov 24, 2024

View reviewed changes

ShahabT reviewed Nov 24, 2024

View reviewed changes

dnr added 2 commits November 25, 2024 11:03

Merge branch 'versioning-3' of github.com:temporalio/temporal into v3…

3bfb74f

…workflow

Merge branch 'versioning-3' of github.com:temporalio/temporal into v3…

281f1ec

…workflow

dnr merged commit 6cf162d into temporalio:versioning-3 Nov 25, 2024
29 of 40 checks passed

dnr deleted the v3workflow branch November 25, 2024 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment workflow improvements #6882

Deployment workflow improvements #6882

dnr commented Nov 23, 2024

ShahabT Nov 24, 2024

Shivs11 Nov 25, 2024

dnr Nov 25, 2024

Shivs11 Nov 25, 2024

ShahabT Nov 24, 2024

dnr Nov 25, 2024

Shivs11 Nov 25, 2024

dnr Nov 25, 2024

ShahabT Nov 24, 2024

dnr Nov 25, 2024

Shivs11 Nov 25, 2024

ShahabT Nov 25, 2024

dnr Nov 25, 2024

dnr Nov 25, 2024

ShahabT Nov 25, 2024

Deployment workflow improvements #6882

Deployment workflow improvements #6882

Conversation

dnr commented Nov 23, 2024

What changed?

Why?

How did you test it?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment