Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fix flakey connector deadlock #1188

Merged
merged 1 commit into from
Sep 15, 2023

Conversation

jgraettinger
Copy link
Member

@jgraettinger jgraettinger commented Sep 15, 2023

Image connectors can deadlock at high volumes of requests and response due to channel stuffing. Specifically it's not okay to await a send into the container while not concurrently polling for container responses.

Re-work the state machine to forward non-error responses using a separate spawned task, which is joined over only upon terminal error or completion.

Description:

(Describe the high level scope of new or changed features)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@jgraettinger jgraettinger marked this pull request as ready for review September 15, 2023 02:18
@jgraettinger
Copy link
Member Author

Testing:

  • catalog test ✅
  • Started a local stack, and verified stat task connectors restart cleanly after re-publishing ops collections several times.
  • Confirmed that I can no longer locally reproduce a deadlock with stats data (I had been able to consistently before).

Copy link
Member

@travjenkins travjenkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to allow quick merging if needed. Did not truly review this.

Image connectors can deadlock at high volumes of requests and response
due to channel stuffing. Specifically it's not okay to await a send into
the container while not concurrently polling for container responses.

Re-work the state machine to forward non-error responses using a
separate spawned task, which is joined over only upon terminal error or
completion.
@jgraettinger jgraettinger merged commit 4bbf665 into master Sep 15, 2023
3 checks passed
@jgraettinger jgraettinger deleted the johnny/connector-deadlock branch September 15, 2023 02:59
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants