-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WRK-200] memory snapshot causes clientclosed error for webapp #2367
[WRK-200] memory snapshot causes clientclosed error for webapp #2367
Conversation
@@ -161,7 +162,7 @@ async def _close(self, prep_for_restore: bool = False): | |||
|
|||
async def _init(self): | |||
"""Connect to server and retrieve version information; raise appropriate error for various failures.""" | |||
logger.debug("Client: Starting") | |||
logger.debug(f"Client ({id(self)}): Starting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the object ID logged helps track the usage of fresh vs. stale clients when debugging snapshot issues.
Instead of having to catch stale client
objects and refresh them it'd be better if the client object itself could catch that it was stale and refresh itself. If this could work then we could remove all current (and future) if self.client._snapshotted
type checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, we should make the client itself detect this, which should be doable through the _call_unary
and _call_stream
methods on it which all RPC methods should be going via now
…ntclosed-error-for-webapp
@prbot approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved 👍. @freider will follow-up review this.
@@ -426,5 +430,8 @@ async def unary_stream( | |||
request, | |||
metadata: Optional[Any] = None, | |||
): | |||
if self.client._snapshotted: | |||
logger.debug(f"refreshing client after snapshot for {self._wrapped_method_name}") | |||
self.client = await _Client.from_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_env is a method that needs to be called (()
) so this probably crashes 😬 - can we add a test that covers this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found a bug 💥
Describe your changes
#2178 introduced a regression in snapshots that wasn't caught by our tests. PR 16550 in the monorepo adds a regression integration test.
UnaryUnaryWrapper
andUnaryStreamWrapper
began capturing references to clients which became stale on snapshot. When these stale and closed snapshots were used on restore exceptions we thrown.Backward/forward compatibility checks
Check these boxes or delete any item (or this section) if not relevant for this PR.
Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.