Add backoff strategy interfaces and implementations #252

ViToni · 2024-04-19T14:34:09Z

This PR introduces the ability to use a backoff strategy for reconnect attempts.

As default a backoff strategy is used whichs return the same backoff value mimicking the previous behaviour.
In addition there is an exponential backoff available which will increase the max value on each attempt up to a configurable limit.
The created backoff is a random value between a configured non-zero minimum and the increasing max (up to the configured limit). Using a random value from an increasing range helps spread client reconnect attempts.

Testing

Existing code has been changed to use the same duration as before but now uses the constant backoff implementation instead. All tests pass.

Additionally there are dedicated tests for:

Constant backoff
Exponential backoff

Closing issues

closes #249 `

MattBrittan

Sorry - I'm suggesting a big change here to deal with an issue that is completly missed by the current ConnectRetryDelay . Please feel free to say "no"!

MattBrittan · 2024-04-22T02:32:32Z

autopaho/auto.go

-	ConnectRetryDelay time.Duration    // How long to wait between connection attempts (defaults to 10s)
-	ConnectTimeout    time.Duration    // How long to wait for the connection process to complete (defaults to 10s)
-	WebSocketCfg      *WebSocketConfig // Enables customisation of the websocket connection
+	ReconnectBackoffStrategy BackoffStrategy  // How long to wait between connection attempts (defaults to 10s)


Removing ConnectRetryDelay makes this a breaking change. I wonder if it's best to keep ConnectRetryDelay (perhaps mark as depreciated) and use it (with a NewConstantBackoffStrategy) if ReconnectBackoffStrategyisnil`.

I'd prefer to have only one way to support backoffs to avoid double code. Using the new interface would mean a one-time effort on the consumer side. But it's your call.
(Acutally I have one more idea for a breaking change: #253)

As we are still pre-V1 a breaking change like this is possible; however my preference would be to make as many breaking changes as possible in one release (there are a few options that have changed and are now marked as depreciated). I'm happy either way as I we are getting fairly close to V1, but if you do make the breaking change then I'll hold off making a release until the other breaking changes are made (think it's best to put as many of these in one release as possible so users don't need to update their code with every release).

autopaho/auto.go

MattBrittan · 2024-04-22T02:40:35Z

autopaho/examples/backoff/backoff.go

@@ -0,0 +1,51 @@
+/*


I'm not sure that this adds much value? I'm just trying to think what a user would find relevant; they would expect an example like this to show how to use the backup strategy with Autopaho rather than this (which might be more appropriate as a test?)

Actually it helped (me) visualize the range of the ExponentialBackoffStrategy. There was a tiny bug while refactoring, which would have been difficult to catch due to the randomness of values but was easily spotable using this file.

MattBrittan · 2024-04-22T02:52:53Z

autopaho/net.go

@@ -106,7 +107,7 @@ func establishServerConnection(ctx context.Context, cfg ClientConfig, firstConne

 		// Delay before attempting another connection
 		select {
-		case <-time.After(cfg.ConnectRetryDelay):
+		case <-time.After(backoff.Next()):


One potential issue here (not handled by the current solution either) is that the broker might reject the CONNECT packet meaning that establishServerConnection is called rapidly. I wonder if we can address this issue at the same time - see comments below.

MattBrittan · 2024-04-22T03:39:14Z

autopaho/backoff.go

+
+// The BackoffStrategy is a container for the configuration of a given strategy whereas the returned Backoff is the actual implementation
+// As such BackoffStrategy instances are supposed to be thread safe and freely sharable between users.
+type BackoffStrategy interface {


I'm suggesting fairly major changes to handle an issue you were not aware of so feel free to tell me to get lost!

Consider the following possibility:

Establish connection and send CONNECT (so establishServerConnection has completed)

Broker rejects CONNECT (say bad password) so loop back to step 1

This can result in us hammering the broker with CONNECT packets (not a good look!). It would be nice if we could solve this issue at the same time as dealing with the backoff on the network level connection.

I've had a quick think about options and wonder if the following might work (and provide a solution that's a bit simpler than your approach):

ConnectionBackoff func(ctx Context.Context, establishConnection func(ctx Context.Context) error) error // Passed function that will establish network connection and calls it until a `nil` error is returned or the context is cancelled.

Then in establishServerConnection you would do something like:

var cli *paho.Client, var connack *paho.Connack) err := cfg.ConnectionBackoff( ctx, func(ctx Context.Context) error { for _, u := range cfg.ServerUrls { connectionCtx, cancelConnCtx := context.WithTimeout(ctx, cfg.ConnectTimeout) .... } } if err != nil { // this would only happen when context cancelled } return cli, connack

The advantage of this is that it means that the ConnectionBackoff has complete control of the process (including resetting, or not, between connections). ConstantBackoff could be very simple (pseudo code!):

type ConstantBackoff struct { Delay time.Duration lastAttempt time.Time } func (cb *ConstantBackoff) Run(ctx context.Context, establishConnection func(ctx Context.Context) error) { // The previous connection attempt may have been very recent; we want at least delay between attempts select { case time.After(cb.Delay - time.Since(cb.lastAttempt)): // Might be negative; i.e. no delay case <-ctx.Done(): return ctx.Err() } for { err := establishConnection(ctx) cb.lastAttempt = time.Now() if err == nil || ctx.Err() != nil { return err } select { case <-time.After(cb.Delay): case <-ctx.Done(): return ctx.Err() } } } // This would be used as follows: cb := ConstantBackoff{Delay: 10 * time.Second} pahoCfg.ConnectionBackoff = cb.Run

Other backoffs could be a bit more complex (and take into account whatever factors are relevant).

My approach was to have a simple implementation. Therefore one can re-use / share the BackoffStrategy while the actual Backoff is "throwaway". It doesn't need a "reset" or anything fancy.
In my experience these mechanisms, where the server could potentially say: "come back in one hour", sound nice but are not worth the hassle. Often the "one hour" is either to early or to late.
Having a rate limiter on the server side and an exponential backoff on the client side has worked for me with planned and unplanned outages in the past. But that's only my use case... so your mileage might vary.

One aspect which is not covered by the CONNECT approach is mentiond in #253. One needs to get a token first before receiving a CONNECT and a fallback in case the server is just gone and doesn't return a CONNECT, the token provider endpoint should be taken care of, too.

Either way, IMO something like taking care of CONNECT should have a dedicated PR and discussion (as it seems to me way more complex than the backoff itself).

IMO something like taking care of CONNECT should have a dedicated PR and discussion (as it seems to me way more complex than the backoff itself).

Sorry - I made an error here (the issue with working on multiple clients concurrently!). In this client establishServerConnection only returns after the CONNACK is received. As such this is not the issue I thought it was (and your current strategy would work).

Note that the reason I raised this (and want to put some time into considering this change) is due to experience with the V3 client where we hit issues because the options became very complicated, so I'm careful with new options in this project (once an option is added it's very hard to remove!).

where the server could potentially say: "come back in one hour",

The one thing that may be a little bit like the scenario you mention ("come back later") is duplicate client id connect loops. This is very common for users new to the protocol where they have two clients with the same client ID and end up with a connect loop (two connections "fighting"). With v3.1 this was difficult to detect (the server just dropped the connection) but with V5 a specific Disconnect would be sent (Reason Code 0x8E - Session taken over) so there is a potential that we could handle it in the future. I only mention this as being able to prevent connection storms when this is happening would be beneficial longer term (so would be nice to have the option of adding support for this).

My approach was to have a simple implementation. Therefore one can re-use / share the BackoffStrategy while the actual Backoff is "throwaway". It doesn't need a "reset" of anything fancy.

I understand that; however part of my reaction here might have been because your code does look a bit Java'ish. I avoided this in the last set of comments because it's highly subjective; but as you specifically asked "So if I hit any no-Go-s" here are a few thoughts (I'll add a few more comments in the code):

The structure made me think of the Java factory pattern; obviously this pattern is used in Go but it's a lot less common than in Java.

Ive not seen use of underscore prefixes (i.e. _Duration) other than in unexported globals. I'd expect this variable to be called delay or similar (with the lower case d indicating it's private).

My suggestion would be to simplify this to (using the Reset option you felt complicated things):

// backoff represents a configured instance of the strategy which computes the next backoff duration. type backoff interface { Reset() NextBackOff() time.Duration }

There are a few reasons for this:

It's simple.

It's compatible with cenkalti/backoff which is the most well known back-off library I'm aware of (so would provide a range of options users could simply plug in). Note: I would prefer to avoid a dependency so the basic strategies would be implemented in this library (which you have done).

It allows authors to consider previous state (e.g. never attempt to connect twice in two minutes) which is doable, but more difficult, with the current structure.

We can extend it in the future without breaking users code. For example to handle the connection fight situation mentioned above we could add:

type backoffWithDisconnected interface { backoff Disconnected(*package.Disconnect) // called when Disconnect packet received }

Then, when a disconnect packet is received, we would do something like:

if on, dwd := cfg.Backoff.(backoffWithDisconnected) ; ok { dwd.Disconnected(pkt) }

This is just an off the cuff thought; so treat as pseudo code but aims to show why this provided more flexibility.

Note: Once more I'm not saying that this is the right approach but do want to be careful with this because once an interface is selected I think we need to consider it locked in.

I understand that; however part of my reaction here might have been because your code does look a bit Java'ish.

You are totally right. Funny how it shows. I've been doing Java for many years and as you correctly guessed this code originates from Java. In Java it was very simple:

public interface BackoffStrategy { Iterator<Duration> backoff(); }

As the custom Iterator was a nested class in the implementation it could easily/safely resuse final fields from the outer class.

But somehow I was not able to implement something similar simple in Go.

MattBrittan

A few of those style/possible java influences I mentioned.

MattBrittan · 2024-04-23T22:29:04Z

autopaho/backoff.go

+
+// The ConstantBackoffStrategy implements the BackoffStrategy interface and provides instances a constant duration.
+type ConstantBackoffStrategy struct {
+	BackoffStrategy


Why embed BackoffStrategy? I suspect this may be a fallback to Java where you would need to declare what interface you are implementing but it's not needed here (and is harmful because it would prevent the compiler from warning you if your implementation of GetBackoff() has the wrong signature.

Many thanks for pointintg this out. I wasn't aware this would be actuall harmful. This was definitely my Java way of thinking...

MattBrittan · 2024-04-23T22:33:47Z

autopaho/backoff.go

+
+// The ConstantBackoff implements the Backoff interface and provides a constant duration.
+type ConstantBackoff struct {
+	Backoff


No need to embed the interface here. Doing this will prevent the compiler from identifying issues (and could lead to a runtime crash see here)

Again thanks for pointing it out (and for the example code showing the issue in action)

MattBrittan · 2024-04-23T22:34:43Z

autopaho/backoff.go

+// The ConstantBackoff implements the Backoff interface and provides a constant duration.
+type ConstantBackoff struct {
+	Backoff
+	_Duration time.Duration


An underscore prefix is not a go standard; just use lower case for the first letter (looks a bit weird for a while but you get used to it).

Thanks for pointing it out. Seems I didn't get far with reading on Go style. I read only about the globals / export part.
Will try to make it be more natural Go.

ViToni · 2024-05-07T16:48:08Z

@MattBrittan The PR now has been updated. Hopefully it looks more Go-ish now.
I'd be more than happy for any feedback.

Things covered in this update:

backwards compatibility and fallback to ConnectRetryDelay if it is set and no strategy was configured
changes to the code making it (hopefully) simpler and less Java-ish

MattBrittan · 2024-05-08T23:10:06Z

@MattBrittan The PR now has been updated. Hopefully it looks more Go-ish now. I'd be more than happy for any feedback.

Thanks - I'll try to have a look next week (sorry, have limited time to spend on this currently).
Update 20th May - still struggling for time; took a quick look but really need to pull the repo and work through it in detail. I am really keen to see this address the situation where the broker drops the connection soon after it comes up (as this is quite a common issue).

ViToni · 2024-08-11T20:28:49Z

Superseded by #258.

ViToni added 4 commits April 18, 2024 20:53

Add generic backoff interface and constant backoff implementation

dad6ecc

Change autopaho to use backoff strategy instead of configurable value

84be3e1

Add exponential backoff implementation and tests

543e5de

Add example for exponential backoff

5230ed3

ViToni force-pushed the feature/add_backoff_strategy branch from 9284b81 to fda2b50 Compare April 20, 2024 17:37

MattBrittan reviewed Apr 22, 2024

View reviewed changes

MattBrittan reviewed Apr 23, 2024

View reviewed changes

ViToni force-pushed the feature/add_backoff_strategy branch from fda2b50 to 0100371 Compare May 7, 2024 16:46

ViToni force-pushed the feature/add_backoff_strategy branch 2 times, most recently from ac3a681 to 5230ed3 Compare May 8, 2024 07:43

ViToni mentioned this pull request Jul 5, 2024

Add option for dynamic backoff #258

Merged

ViToni closed this Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backoff strategy interfaces and implementations #252

Add backoff strategy interfaces and implementations #252

ViToni commented Apr 19, 2024 •

edited

Loading

MattBrittan left a comment

MattBrittan Apr 22, 2024

ViToni Apr 23, 2024 •

edited

Loading

MattBrittan Apr 23, 2024

MattBrittan Apr 22, 2024

ViToni May 7, 2024 •

edited

Loading

MattBrittan Apr 22, 2024 •

edited

Loading

MattBrittan Apr 22, 2024 •

edited

Loading

ViToni Apr 23, 2024 •

edited

Loading

MattBrittan Apr 23, 2024

ViToni Apr 24, 2024

MattBrittan left a comment

MattBrittan Apr 23, 2024

ViToni Apr 24, 2024

MattBrittan Apr 23, 2024

ViToni Apr 24, 2024

MattBrittan Apr 23, 2024

ViToni Apr 24, 2024

ViToni commented May 7, 2024 •

edited

Loading

MattBrittan commented May 8, 2024 •

edited

Loading

ViToni commented Aug 11, 2024

Add backoff strategy interfaces and implementations #252

Add backoff strategy interfaces and implementations #252

Conversation

ViToni commented Apr 19, 2024 • edited Loading

MattBrittan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ViToni Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ViToni May 7, 2024 • edited Loading

Choose a reason for hiding this comment

MattBrittan Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

MattBrittan Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

ViToni Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattBrittan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ViToni commented May 7, 2024 • edited Loading

MattBrittan commented May 8, 2024 • edited Loading

ViToni commented Aug 11, 2024

ViToni commented Apr 19, 2024 •

edited

Loading

ViToni Apr 23, 2024 •

edited

Loading

ViToni May 7, 2024 •

edited

Loading

MattBrittan Apr 22, 2024 •

edited

Loading

MattBrittan Apr 22, 2024 •

edited

Loading

ViToni Apr 23, 2024 •

edited

Loading

ViToni commented May 7, 2024 •

edited

Loading

MattBrittan commented May 8, 2024 •

edited

Loading