-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix topic writer infinite reconnections #1006
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1006 +/- ##
==========================================
- Coverage 67.64% 67.45% -0.20%
==========================================
Files 261 252 -9
Lines 24686 24513 -173
==========================================
- Hits 16700 16535 -165
+ Misses 7127 7112 -15
- Partials 859 866 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@art22m thanks for your PR, I have been see it and will check soon in details. Fix broken test please. |
Hello @art22m Thanks for your pr and great work for research and describe the bug. Function CheckResetReconnectionCounters used for detect when we have established stream and reset attempts counters - for good logging. What you mean about set some reasonable constant for check "connection established" state for case with connection timeout is infinite? For example - 1 minute. It will be mean: |
Hi, thanks for feedback.
May be I did not get your point, but should we check last connection was earlier then constant, not later? I guess you want to compare constant with lastConnectionAttempt variable. |
When reconnect timeout is infinite, then we should check: if duration since last attempt was more then constant. I want detect a situation:
And count the failure as new attempt for logs and for retry policy. |
I've made one minute constant, but I guess right constant should be find empirically. |
@art22m Thanks for the fix:) |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Hello!
I've created topic writer with
StartTimeout
30 seconds and Default or RetryRetryPolicy
.Then with ip6tables I've blocked the YDB port to get transport error.
The retries are expected to stop after 30 seconds, but this does not happened.
The reason why this is happened is
connectionTimeout*resetAttemptEmpiricalCoefficient
overflow.If
connectionTimeout
istime.Duration(math.MaxInt64)
(that is always true, since we do not have options to setconnectionTimeout
), then its multiplication to empirical coefficient (=10) gives negative value (see playground https://goplay.space/#qlIeS6o3PCz).As a result, function
CheckResetReconnectionCounters
always returns true. SinceCheckResetReconnectionCounters
gives true,startOfRetries
inconnectionLoop
loop method always sets tow.clock.Now()
. Thus, inCheckRetryMode
there are no chance to getretriesDuration > settings.StartTimeout
to stop reconnections. As a result, we always get infinite reconnections inRetry
andDefault
modes.Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
Topic writer with
StartTimeout
set to X and Default or RetryRetryPolicy
.Topic writer gets any error in Retry mode or retryable error in Default mode.
After X time topic writer continue reconnections.
Issue Number: N/A
What is the new behavior?
Topic writer with
StartTimeout
set to X and Default or RetryRetryPolicy
.Topic writer gets any error in Retry mode or retryable error in Default mode.
After X time topic writer stop reconnections.
Other information
Condition
startOfRetries.IsZero()
is needed to setstartOfRetries
for the first time.Since with infinite connection duration
CheckResetReconnectionCounters
will return false.