Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff with jitter of the retry middleware V2 often produces a wait time of 0 ms #2991

Open
fredericgboutin-yapla opened this issue Aug 28, 2024 · 0 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue queued This issues is on the AWS team's backlog

Comments

@fredericgboutin-yapla
Copy link

fredericgboutin-yapla commented Aug 28, 2024

Describe the bug

I used the retry feature of the AWS Clients thoroughly in the last few years, so much so, that I decided to borrow the very same logic in my own Guzzle client when doing my other API calls (with a custom Guzzle middleware). This is literally the same logic.

I wrote a few unit-tests around my middleware, to ensure that the delay kicks in properly. The tests run in my CI/CD on almost every commit done in our repository (which is a lot over time) and it has been so for many months now.

Last week, the unit tests failed because I was asserting that the delay value should be GREATER than 0. Looking at the logic, yeah 0 is a valid value that I should have included but I didn't. Anyway, I did not have time to adjust the tests, so I just re-ran my unit-tests and it went successfully.

Yesterday, the same unit-test failed again. What are the odds? Not impossible but still... Well, this is were it's getting interesting.

The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.

Source: https://stackoverflow.com/questions/30060396/mt-rand-returns-0

One major difference between Mersenne Twister RNG and a hybrid/hardware approach is that the entire sequence of the random numbers depends on the seed. PHP's mt_srand is used to seed the RNG. Mersenne Twister yields an identical random number sequence for the same seed.

This deterministic nature is useful when an application/game needs to generate values based on a single seed (such as Minecraft). However, Mersenne Twister a bad choice for applications that need to generate random numbers for cryptographic purposes because the internal state of the RNG can be derived by observing the sequences of random numbers the RNG yields.

Source: https://php.watch/articles/testing-php-rand-functions#mt

Bug

Long story short, the exponentialDelayWithJitter() is using the mt_rand function and under certain conditions, it could return 0 or even values more often than not.

Expected Behavior

A more distributed delay value on retry, especially to avoid excessive 0 values.

Current Behavior

I just feel I repeat myself over and over now.

Reproduction Steps

I just feel I repeat myself over and over now.

Possible Solution

I personally fixed it by using the random_int function instead- https://www.php.net/manual/en/function.random-int.php.

For example,

$rand = random_int(1, mt_getrandmax()) / mt_getrandmax();

But I'm not expert here and I'm curious to know what you think we should do.

Additional Information/Context

No response

SDK version used

3.320.9

Environment details (Version of PHP (php -v)? OS name and version, etc.)

7.4 and 8.2 based on the official 64-bit PHP docker images (apache-bullseye)

@fredericgboutin-yapla fredericgboutin-yapla added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 28, 2024
@fredericgboutin-yapla fredericgboutin-yapla changed the title Exponential backoff with jitter of the retry middleware V2 often produces a wait time of 0 seconds Exponential backoff with jitter of the retry middleware V2 often produces a wait time of 0 ms Aug 28, 2024
@yenfryherrerafeliz yenfryherrerafeliz added investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Aug 28, 2024
@yenfryherrerafeliz yenfryherrerafeliz self-assigned this Sep 2, 2024
@yenfryherrerafeliz yenfryherrerafeliz added queued This issues is on the AWS team's backlog and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue queued This issues is on the AWS team's backlog
Projects
None yet
Development

No branches or pull requests

2 participants