Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand WF context locking to cover WFT responses #1179

Closed
wants to merge 1 commit into from
Closed

Expand WF context locking to cover WFT responses #1179

wants to merge 1 commit into from

Conversation

mmcshane
Copy link
Contributor

@mmcshane mmcshane commented Jul 27, 2023

What was changed

Lifted the workflow context lock out of WFTaskHandler.ProcessWorkflowTask up into the surrounding Poller. Having the lock managed at this higher level allows us to include the RespondWorkflowTaskCompleted call and the (potential) reset of the expected next event ID into the same atomic block.

Why?

Failure to hold this lock while responding to a workflow task allows in-flight tasks to race past each other which has led to history corruption in the case where responses are not deduplicated. Furthermore the correctness of resetting the event level while not holding this lock is unclear at best.

Checklist

  1. Closes

  2. How was this tested:

  1. Any docs updates needed?

Not holding this lock while responding to a workflow task allows
in-flight tasks to race past each other which has led to history
corruption in the case where responses are not deduplicated. Furthermore
the correctness of resetting the event level while not holding this lock
is unclear at best.
@mmcshane mmcshane marked this pull request as ready for review July 28, 2023 17:07
@mmcshane mmcshane requested a review from a team as a code owner July 28, 2023 17:07
@mmcshane
Copy link
Contributor Author

Still missing is the unit test to verify that a successor task does not begin processing until after the current WFT has been responded (and possibly reset). But tests are passing now so it's not a waste of time to review this.

@mmcshane
Copy link
Contributor Author

This is busted. Allowing the Context to remove itself from cache before it's unlocked means that it doesn't work as a mutex anymore. I'll rework.

@mmcshane mmcshane closed this Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant