-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment workflow improvements #6882
Conversation
@@ -191,7 +151,70 @@ func (d *DeploymentWorkflowRunner) run() error { | |||
|
|||
d.logger.Debug("Deployment doing continue-as-new") | |||
return workflow.NewContinueAsNewError(d.ctx, DeploymentWorkflow, d.DeploymentWorkflowArgs) | |||
} | |||
|
|||
func (d *DeploymentWorkflowRunner) validateRegisterWorker(args *deploymentspb.RegisterWorkerInDeploymentArgs) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pasting this here since this is something I learned and might help somebody - I was curious on understanding what would happen if the validator were to be called before the function conducting the update was - to the naked eye, this would panic since there could be a chance args.TaskQueue
may not have a value present in the map
turns out all panics from validator function will be returned as errors which is sweet since nothing shall break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validator is called before the update, that's the whole point.
If args.TaskQueueName isn't in the map, the lookup will return nil. GetTaskQueues() on nil returns nil, and a lookup on that nil map will return not found. It will not panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup on that nil map will return not found.
I thought over here, we would receive a panic since we could be trying to do a nil
lookup on a nil
map right - my apologies if my assumption was wrong...
} | ||
|
||
func (d *DeploymentWorkflowRunner) handleRegisterWorker(ctx workflow.Context, args *deploymentspb.RegisterWorkerInDeploymentArgs) error { | ||
// Note: use ctx in here (provided by update) instead of d.ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, what would be the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know! The update handler is passed a context but I don't know if it's different from the context passed to the runner. Probably not but it seems better to use it until I confirm it's okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious too - although I do think it's d.ctx
being passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with sdk team that update handlers must used the Context passed to them, not the Context of the top-level workflow. This is a pretty good argument for not putting the Context in the struct actually
// wait until series workflow started | ||
err = workflow.Await(ctx, func() bool { return d.DeploymentLocalState.StartedSeriesWorkflow }) | ||
if err != nil { | ||
d.logger.Error("Update canceled before series workflow started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the series fails to create because say it exceeds a limit on the number of series, wf will return on line 126 and in here we'd return this error to matching? Not for now, but would it be possible to propagate the "series count limit exceeded" error to matching and eventually pollers somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how we're implementing those limits.. StartWorkflowExecution or SignalWithStart couldn't fail for some application-defined limit, we'd have to change it to update-with-start? In any case, yeah I'd expect this to change when we add limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShahabT - i might be wrong but the client kickstarts an update-with-start which shall, thanks to David's changes, kickstart a series workflow. If we enforce a limit check and return an error in the series workflow definition (per namespace limit), we reach line 126 from which the whole multi-operation should fail - in that case, the update should not even be executed right? moreover, that error shall be propagated to pollers too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is if matching would receive a specific error message such as "series count limit exceeded" instead of a generic message such as "Update canceled before series workflow started" or any other generic error may be sent by server because of the multi-operation failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no multi operation at this level... matching -> deployment wf is an update-with-start, but deployment wf -> deployment series wf is currently just a start. if we want to return an error, that has to change to update-with-start also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if this deployment wf exceeds some limit, what happens to the wf? Does it stick around to tell matching "too many"? If not, then matching will try to re-create it every time, which is bad. But if it does stick around, that's using resources that shouldn't be used. Seems like we need a cache somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking matching would remember in memory and not retry registration, at least until a few minutes.
## What changed? - Start series workflow only once instead of trying on every update - Use validator to reject update if already present - Don't update local state if sync to user data fails - Extract anonymous functions to methods - Fix escaping ## Why? Reduce overhead, avoid unnecessary calls ## How did you test it? existing tests
## What changed? - Start series workflow only once instead of trying on every update - Use validator to reject update if already present - Don't update local state if sync to user data fails - Extract anonymous functions to methods - Fix escaping ## Why? Reduce overhead, avoid unnecessary calls ## How did you test it? existing tests
What changed?
Why?
Reduce overhead, avoid unnecessary calls
How did you test it?
existing tests