-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug where invalidation messages were getting sent to closing clients #1823
Conversation
Signed-off-by: Madelyn Olson <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1823 +/- ##
============================================
+ Coverage 70.87% 71.02% +0.14%
============================================
Files 123 123
Lines 65651 65665 +14
============================================
+ Hits 46529 46636 +107
+ Misses 19122 19029 -93
🚀 New features to boost your workflow:
|
Fixes #1647 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, it is new to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say, that doesn't really explain to me why it was hanging on the final read.
I am not sure how we can thus say is "Fixes" #1647 ?
If I understand you claim that due to the fact that sync io is used, we might be able to identify the client in sendTrackingMessage
so we will not send the tracking-redir-broken
?
I agree this does not seem like the root cause for this specific error.
I would also ask why we are only satisfied with checking the redirection client exists in sendTrackingMessage
and not try to lookup his flags are not close_asap or close_after_reply? seems like a bug.
I'm not confident there aren't more edge cases. However, it ran a little over 17 million times over night on my laptop and never hung (although some other tests failed I've never seen fail), whereas it consistently hung after ~100 iterations before the change, so, at the very least this is more stable.
I'm not exactly sure what you mean, we're not sending the tracking message to the client that is closing. EDIT: I understand now, let me try this. |
Signed-off-by: Madelyn Olson <[email protected]>
Oh, I figured out why it's hanging, the test is not actually checking for the invalidation. |
Signed-off-by: Madelyn Olson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
So I think we were seeing these timeouts because QUIT behaves differently between IO threading and non-IO Threading. In both cases,
QUIT
is a close after reply command. Once the client has written out the results, it gets added to the queue to that gets cleaned up at the end of the event loop. Normally this is fine, as before we circle around to the next event loop this client is definitely killed.For IO threads, we need to process the pending IO commands to add the client to the kill queue. This may not happen immediately, which means we might go down and process that
SET
command before we free the client that is supposedly already quit. This is very sensitive to timing, so it's not very likely, but still possible. Once theSET
has been executed, the invariants in the tests are off since it will get a correct invalidation.The fix is to also mark a client as broken if it's being closed.
The test was also hanging because of a test issue, because the conditional
lsearch
check was returning 1 or 0 strings, which are both valid exit criteria for the wait_for.Fixes #1647 (I believe this now!)