Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reset] Handle completion events from restarted children #7295

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gow
Copy link
Contributor

@gow gow commented Feb 7, 2025

What changed?

In this PR we are doing 2 things,

  1. Cleaning up the entries in ChildrenInitializedPostResetPoint
  2. Verifying that the results are indeed from the correct restarted child before accepting the completion event.

Why?

  • Cleans up the mutable state and reduces bloat.
  • Avoids a race condition where a old child completed in between the time when new child Init is recorded and when it is actually started.

How did you test it?

Added unit tests + manual testing.

Potential risks

N/A

Documentation

Pending

Is hotfix candidate?

No

@gow gow requested a review from a team as a code owner February 7, 2025 20:00
@gow gow marked this pull request as draft February 7, 2025 20:00
@gow gow requested a review from yycptt February 7, 2025 20:00
@gow gow marked this pull request as ready for review February 7, 2025 22:18
return nil, consts.ErrChildExecutionNotFound
}
initiatedAttr := initiatedEvent.GetStartChildWorkflowExecutionInitiatedEventAttributes()
childID := fmt.Sprintf("%s:%s", initiatedAttr.GetWorkflowType().Name, initiatedAttr.GetWorkflowId())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we really should create a util function for this concatenation logic. I think I've seen it in 3 different places.

childrenInitializedAfterResetPoint := mutableState.GetChildrenInitializedPostResetPoint()
if len(childrenInitializedAfterResetPoint) > 0 {
// This parent was reset and it also has some children that potentially were restarted.
initiatedEvent, err := mutableState.GetChildExecutionInitiatedEvent(ctx, parentInitiatedID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't think we need to load the initiatedEvent. Both child workflow type and child workflowID are available in ChildExecutionInfo in mutable state I think.

}
// The results are from the child that this parent initiated. We should stop tracking the child and process the result now.
delete(childrenInitializedAfterResetPoint, childID)
mutableState.SetChildrenInitializedPostResetPoint(childrenInitializedAfterResetPoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update ms.approximateSize when setting the reset point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants