Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve retry policy for requests #85

Open
tobiaskamenicky opened this issue Sep 25, 2019 · 3 comments
Open

Improve retry policy for requests #85

tobiaskamenicky opened this issue Sep 25, 2019 · 3 comments
Labels
enhancement groomed The issue has been groomed and should be in a good shape. hacktoberfest help wanted

Comments

@tobiaskamenicky
Copy link

tobiaskamenicky commented Sep 25, 2019

Motivation

From February 1st, 2020 Delivery API will start enforcing rate limits to ensure fair usage by as many clients as possible. These rate limits should be high enough to work for most clients by default. However, in particular cases Delivery API might respond with 429 Too many requests. These responses will also contain the Retry-After header indicating how long the client should wait before making a follow-up request.

Proposed solution

Implement a retry policy that handles 429 Too many requests and retries the HTTP call using information from an optional Retry-After header. It can contain a date after which to retry or the seconds to delay after the response is received. We recommend a retry policy with exponential backoff and a jitter strategy to overcome peaks of similar retries coming from many clients. The retry policy should handle the following status codes:

  • 408 Request Timeout
  • 429 Too many requests
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

Please note that both 429 Too many requests and 503 Service Unavailable responses might contain an optional Retry-After header.

Additional context

@aweigold
Copy link
Contributor

aweigold commented Oct 1, 2019

The current implementation performs a binary exponential backoff:
https://github.com/Kentico/delivery-sdk-java/blob/1d4e94f8a6cabc6cb4275e370700b9db98b1c278/delivery/src/main/java/kentico/kontent/delivery/AsyncDeliveryClient.java#L507

Where counter is based on the number of attempts. The max number of retry attempts is configurable and defaults to 3.

This certainly could be updated to using a random with a backoff if that is the standard for all the SDKs.

As far as a MaxCumulativeWaitTime set to default to 30 seconds... I'm open to discussing that, but in my experience with my previous employers use cases, this would be very detrimental and could hinder consumers from failing fast.

I concur that if a Retry-After header is sent, that it should be respected. The current implementation does not read that header.

@tobiaskamenicky
Copy link
Author

I updated the description of the issue. The main motivation is to prepare SDKs for a rate limitation that is planned in Delivery API. Therefore, retry policy (regardless the implementation) needs to handle the mentioned status codes and respect "Retry-After" header for 429 and 503.

@alesk-kentico
Copy link

Adam, thank you for your feedback. As Tobias already mentioned, the main goal is to prepare clients for changes in Delivery API that will take effect on February 1st, 2020. To do so each SDK should implement a retry policy that supports the defined range of HTTP status codes and the Retry-After header. SDK authors are free to choose the implementation or handle additional error conditions as the .NET SDK already does.

Regarding the waiting time, it is a difficult to make as different users have different expectations. We see two major use cases; integrations and web applications. While integrations can usually take as much time as they need, web applications might fail fast because they can display an error message. And as you pointed out, it might be better to display an error message (or take another action) rather than provide no response at all. However, if we had decided to favor web applications, what would be the definition of failing fast? Would it be 1 second or 1 minute? It seems there is no right value and the result is a compromise.

We believe that developers should customize the retry policy for each application and we could only provide them with a few hints. Also, we are currently working on the best practices for developers who use the Delivery API, and this is definitely a topic that will be included.

Could you please share more information about your experience with previous employers' use cases?

@Simply007 Simply007 added enhancement groomed The issue has been groomed and should be in a good shape. hacktoberfest help wanted labels Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement groomed The issue has been groomed and should be in a good shape. hacktoberfest help wanted
Projects
None yet
Development

No branches or pull requests

4 participants