Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PECO-729] Improve retry behavior #230

Merged
merged 12 commits into from
Mar 18, 2024
Merged

Conversation

kravets-levko
Copy link
Contributor

@kravets-levko kravets-levko commented Feb 14, 2024

PECO-729

  • Respect Retry-After header and update backoff algorithm
  • Update list of HTTP status code that could be retried
  • Retry only idempotent requests (a restricted set of Thrift operations)
  • Add/update tests

Should be easier (hopefully) to review commit by commit

Note: this PR doesn't include logic for retry on network errors. That part will be covered in a follow-up

Signed-off-by: Levko Kravets <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 93.19%. Comparing base (17112c7) to head (7303b9a).

Files Patch % Lines
lib/connection/connections/ThriftHttpConnection.ts 95.23% 1 Missing ⚠️
lib/hive/Commands/BaseCommand.ts 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #230      +/-   ##
==========================================
+ Coverage   93.09%   93.19%   +0.09%     
==========================================
  Files          62       63       +1     
  Lines        1478     1513      +35     
  Branches      256      262       +6     
==========================================
+ Hits         1376     1410      +34     
- Misses         40       41       +1     
  Partials       62       62              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

expect(result.retryAfter).to.equal(200);
});

it('should use backoff when `Retry-After` header is missing', async () => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benc-db , was there an issue in serverless when the retry-after was too short and we ended up running out of retries waiting for cluster to start up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the issue we were hitting was that Retry-After is always either 1s for serverless or 5s for classic. For pysql, we moved back to always using this Retry-After as a floor, and otherwise using exponential backoff, because 2.5 minutes (or 30s for serverless!) is often not enough time for compute to become available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrefurlan-db @benc-db I updated code so now Retry-After is used as a lower bound for backoff algorithm (not instead of backoff) - eac7d77 Please take one more look. Thank you!

@kravets-levko kravets-levko merged commit 6673660 into main Mar 18, 2024
8 checks passed
@kravets-levko kravets-levko deleted the PECO-729-improve-retry-strategy branch March 18, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants