-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backoff strategy interfaces and implementations #252
Conversation
9284b81
to
fda2b50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I'm suggesting a big change here to deal with an issue that is completly missed by the current ConnectRetryDelay
. Please feel free to say "no"!
ConnectRetryDelay time.Duration // How long to wait between connection attempts (defaults to 10s) | ||
ConnectTimeout time.Duration // How long to wait for the connection process to complete (defaults to 10s) | ||
WebSocketCfg *WebSocketConfig // Enables customisation of the websocket connection | ||
ReconnectBackoffStrategy BackoffStrategy // How long to wait between connection attempts (defaults to 10s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing ConnectRetryDelay makes this a breaking change. I wonder if it's best to keep ConnectRetryDelay
(perhaps mark as depreciated) and use it (with a NewConstantBackoffStrategy
) if ReconnectBackoffStrategyis
nil`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have only one way to support backoffs to avoid double code. Using the new interface would mean a one-time effort on the consumer side. But it's your call.
(Acutally I have one more idea for a breaking change: #253)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are still pre-V1 a breaking change like this is possible; however my preference would be to make as many breaking changes as possible in one release (there are a few options that have changed and are now marked as depreciated). I'm happy either way as I we are getting fairly close to V1, but if you do make the breaking change then I'll hold off making a release until the other breaking changes are made (think it's best to put as many of these in one release as possible so users don't need to update their code with every release).
@@ -0,0 +1,51 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this adds much value? I'm just trying to think what a user would find relevant; they would expect an example like this to show how to use the backup strategy with Autopaho rather than this (which might be more appropriate as a test?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it helped (me) visualize the range of the ExponentialBackoffStrategy
. There was a tiny bug while refactoring, which would have been difficult to catch due to the randomness of values but was easily spotable using this file.
@@ -106,7 +107,7 @@ func establishServerConnection(ctx context.Context, cfg ClientConfig, firstConne | |||
|
|||
// Delay before attempting another connection | |||
select { | |||
case <-time.After(cfg.ConnectRetryDelay): | |||
case <-time.After(backoff.Next()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue here (not handled by the current solution either) is that the broker might reject the CONNECT packet meaning that establishServerConnection
is called rapidly. I wonder if we can address this issue at the same time - see comments below.
|
||
// The BackoffStrategy is a container for the configuration of a given strategy whereas the returned Backoff is the actual implementation | ||
// As such BackoffStrategy instances are supposed to be thread safe and freely sharable between users. | ||
type BackoffStrategy interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting fairly major changes to handle an issue you were not aware of so feel free to tell me to get lost!
Consider the following possibility:
- Establish connection and send CONNECT (so
establishServerConnection
has completed) - Broker rejects CONNECT (say bad password) so loop back to step 1
This can result in us hammering the broker with CONNECT packets (not a good look!). It would be nice if we could solve this issue at the same time as dealing with the backoff on the network level connection.
I've had a quick think about options and wonder if the following might work (and provide a solution that's a bit simpler than your approach):
ConnectionBackoff func(ctx Context.Context, establishConnection func(ctx Context.Context) error) error // Passed function that will establish network connection and calls it until a `nil` error is returned or the context is cancelled.
Then in establishServerConnection
you would do something like:
var cli *paho.Client,
var connack *paho.Connack)
err := cfg.ConnectionBackoff( ctx, func(ctx Context.Context) error {
for _, u := range cfg.ServerUrls {
connectionCtx, cancelConnCtx := context.WithTimeout(ctx, cfg.ConnectTimeout)
....
}
}
if err != nil { // this would only happen when context cancelled
}
return cli, connack
The advantage of this is that it means that the ConnectionBackoff
has complete control of the process (including resetting, or not, between connections). ConstantBackoff
could be very simple (pseudo code!):
type ConstantBackoff struct {
Delay time.Duration
lastAttempt time.Time
}
func (cb *ConstantBackoff) Run(ctx context.Context, establishConnection func(ctx Context.Context) error) {
// The previous connection attempt may have been very recent; we want at least delay between attempts
select {
case time.After(cb.Delay - time.Since(cb.lastAttempt)): // Might be negative; i.e. no delay
case <-ctx.Done():
return ctx.Err()
}
for {
err := establishConnection(ctx)
cb.lastAttempt = time.Now()
if err == nil || ctx.Err() != nil {
return err
}
select {
case <-time.After(cb.Delay):
case <-ctx.Done():
return ctx.Err()
}
}
}
// This would be used as follows:
cb := ConstantBackoff{Delay: 10 * time.Second}
pahoCfg.ConnectionBackoff = cb.Run
Other backoffs could be a bit more complex (and take into account whatever factors are relevant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approach was to have a simple implementation. Therefore one can re-use / share the BackoffStrategy
while the actual Backoff
is "throwaway". It doesn't need a "reset" or anything fancy.
In my experience these mechanisms, where the server could potentially say: "come back in one hour", sound nice but are not worth the hassle. Often the "one hour" is either to early or to late.
Having a rate limiter on the server side and an exponential backoff on the client side has worked for me with planned and unplanned outages in the past. But that's only my use case... so your mileage might vary.
One aspect which is not covered by the CONNECT approach is mentiond in #253. One needs to get a token first before receiving a CONNECT and a fallback in case the server is just gone and doesn't return a CONNECT, the token provider endpoint should be taken care of, too.
Either way, IMO something like taking care of CONNECT should have a dedicated PR and discussion (as it seems to me way more complex than the backoff itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO something like taking care of CONNECT should have a dedicated PR and discussion (as it seems to me way more complex than the backoff itself).
Sorry - I made an error here (the issue with working on multiple clients concurrently!). In this client establishServerConnection
only returns after the CONNACK
is received. As such this is not the issue I thought it was (and your current strategy would work).
Note that the reason I raised this (and want to put some time into considering this change) is due to experience with the V3 client where we hit issues because the options became very complicated, so I'm careful with new options in this project (once an option is added it's very hard to remove!).
where the server could potentially say: "come back in one hour",
The one thing that may be a little bit like the scenario you mention ("come back later") is duplicate client id connect loops. This is very common for users new to the protocol where they have two clients with the same client ID and end up with a connect loop (two connections "fighting"). With v3.1 this was difficult to detect (the server just dropped the connection) but with V5 a specific Disconnect would be sent (Reason Code 0x8E - Session taken over) so there is a potential that we could handle it in the future. I only mention this as being able to prevent connection storms when this is happening would be beneficial longer term (so would be nice to have the option of adding support for this).
My approach was to have a simple implementation. Therefore one can re-use / share the BackoffStrategy while the actual Backoff is "throwaway". It doesn't need a "reset" of anything fancy.
I understand that; however part of my reaction here might have been because your code does look a bit Java'ish. I avoided this in the last set of comments because it's highly subjective; but as you specifically asked "So if I hit any no-Go-s" here are a few thoughts (I'll add a few more comments in the code):
- The structure made me think of the Java factory pattern; obviously this pattern is used in Go but it's a lot less common than in Java.
- Ive not seen use of underscore prefixes (i.e.
_Duration
) other than in unexported globals. I'd expect this variable to be calleddelay
or similar (with the lower cased
indicating it's private).
My suggestion would be to simplify this to (using the Reset
option you felt complicated things):
// backoff represents a configured instance of the strategy which computes the next backoff duration.
type backoff interface {
Reset()
NextBackOff() time.Duration
}
There are a few reasons for this:
It's simple.
It's compatible with cenkalti/backoff
which is the most well known back-off library I'm aware of (so would provide a range of options users could simply plug in). Note: I would prefer to avoid a dependency so the basic strategies would be implemented in this library (which you have done).
It allows authors to consider previous state (e.g. never attempt to connect twice in two minutes) which is doable, but more difficult, with the current structure.
We can extend it in the future without breaking users code. For example to handle the connection fight situation mentioned above we could add:
type backoffWithDisconnected interface {
backoff
Disconnected(*package.Disconnect) // called when Disconnect packet received
}
Then, when a disconnect packet is received, we would do something like:
if on, dwd := cfg.Backoff.(backoffWithDisconnected) ; ok {
dwd.Disconnected(pkt)
}
This is just an off the cuff thought; so treat as pseudo code but aims to show why this provided more flexibility.
Note: Once more I'm not saying that this is the right approach but do want to be careful with this because once an interface is selected I think we need to consider it locked in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that; however part of my reaction here might have been because your code does look a bit Java'ish.
You are totally right. Funny how it shows. I've been doing Java for many years and as you correctly guessed this code originates from Java. In Java it was very simple:
public interface BackoffStrategy {
Iterator<Duration> backoff();
}
As the custom Iterator
was a nested class in the implementation it could easily/safely resuse final fields from the outer class.
But somehow I was not able to implement something similar simple in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few of those style/possible java influences I mentioned.
autopaho/backoff.go
Outdated
|
||
// The ConstantBackoffStrategy implements the BackoffStrategy interface and provides instances a constant duration. | ||
type ConstantBackoffStrategy struct { | ||
BackoffStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why embed BackoffStrategy
? I suspect this may be a fallback to Java where you would need to declare what interface you are implementing but it's not needed here (and is harmful because it would prevent the compiler from warning you if your implementation of GetBackoff()
has the wrong signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for pointintg this out. I wasn't aware this would be actuall harmful. This was definitely my Java way of thinking...
autopaho/backoff.go
Outdated
|
||
// The ConstantBackoff implements the Backoff interface and provides a constant duration. | ||
type ConstantBackoff struct { | ||
Backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to embed the interface
here. Doing this will prevent the compiler from identifying issues (and could lead to a runtime crash see here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again thanks for pointing it out (and for the example code showing the issue in action)
autopaho/backoff.go
Outdated
// The ConstantBackoff implements the Backoff interface and provides a constant duration. | ||
type ConstantBackoff struct { | ||
Backoff | ||
_Duration time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An underscore prefix is not a go standard; just use lower case for the first letter (looks a bit weird for a while but you get used to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. Seems I didn't get far with reading on Go style. I read only about the globals / export part.
Will try to make it be more natural Go.
fda2b50
to
0100371
Compare
@MattBrittan The PR now has been updated. Hopefully it looks more Go-ish now. Things covered in this update:
|
ac3a681
to
5230ed3
Compare
Thanks - I'll try to have a look next week (sorry, have limited time to spend on this currently). |
Superseded by #258. |
This PR introduces the ability to use a backoff strategy for reconnect attempts.
As default a backoff strategy is used whichs return the same backoff value mimicking the previous behaviour.
In addition there is an exponential backoff available which will increase the max value on each attempt up to a configurable limit.
The created backoff is a random value between a configured non-zero minimum and the increasing max (up to the configured limit). Using a random value from an increasing range helps spread client reconnect attempts.
Testing
Existing code has been changed to use the same duration as before but now uses the constant backoff implementation instead. All tests pass.
Additionally there are dedicated tests for:
Closing issues
closes #249 `