Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WRK-200] memory snapshot causes clientclosed error for webapp #2367

Conversation

thundergolfer
Copy link
Contributor

@thundergolfer thundergolfer commented Oct 21, 2024

Describe your changes

  • WRK-200

#2178 introduced a regression in snapshots that wasn't caught by our tests. PR 16550 in the monorepo adds a regression integration test.

UnaryUnaryWrapper and UnaryStreamWrapper began capturing references to clients which became stale on snapshot. When these stale and closed snapshots were used on restore exceptions we thrown.

Backward/forward compatibility checks

Check these boxes or delete any item (or this section) if not relevant for this PR.

  • Client+Server: this change is compatible with old servers
  • Client forward compatibility: this change ensures client can accept data intended for later versions of itself

Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.


@@ -161,7 +162,7 @@ async def _close(self, prep_for_restore: bool = False):

async def _init(self):
"""Connect to server and retrieve version information; raise appropriate error for various failures."""
logger.debug("Client: Starting")
logger.debug(f"Client ({id(self)}): Starting")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the object ID logged helps track the usage of fresh vs. stale clients when debugging snapshot issues.

Instead of having to catch stale client objects and refresh them it'd be better if the client object itself could catch that it was stale and refresh itself. If this could work then we could remove all current (and future) if self.client._snapshotted type checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, we should make the client itself detect this, which should be doable through the _call_unary and _call_stream methods on it which all RPC methods should be going via now

@thundergolfer thundergolfer requested a review from freider October 21, 2024 15:24
@thundergolfer
Copy link
Contributor Author

@prbot approve

Copy link

@modal-pr-review-automation modal-pr-review-automation bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved 👍. @freider will follow-up review this.

@thundergolfer thundergolfer merged commit 80a9a57 into main Oct 22, 2024
21 checks passed
@thundergolfer thundergolfer deleted the jonathon/wrk-200-memory_snapshot-causes-clientclosed-error-for-webapp branch October 22, 2024 18:58
@@ -426,5 +430,8 @@ async def unary_stream(
request,
metadata: Optional[Any] = None,
):
if self.client._snapshotted:
logger.debug(f"refreshing client after snapshot for {self._wrapped_method_name}")
self.client = await _Client.from_env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_env is a method that needs to be called (()) so this probably crashes 😬 - can we add a test that covers this case?

Copy link
Contributor

@freider freider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a bug 💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants