Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiters do not have a configurable retry / timeout strategy #1431

Open
1 task
cloudshiftchris opened this issue Oct 4, 2024 · 0 comments
Open
1 task

Waiters do not have a configurable retry / timeout strategy #1431

cloudshiftchris opened this issue Oct 4, 2024 · 0 comments
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@cloudshiftchris
Copy link

cloudshiftchris commented Oct 4, 2024

Describe the bug

When using waiters the underlying operation may take considerable time that exceeds the hard-coded attempts/timeouts. This results in the waiter throwing a TooManyAttemptsException.

For example, RdsClient.waitUntilDBSnapshotAvailable has a hard-coded retry strategy:

val strategy = StandardRetryStrategy {
        maxAttempts = 20
        tokenBucket = InfiniteTokenBucket
        delayProvider {
            initialDelay = 30_000.milliseconds
            scaleFactor = 1.5
            jitter = 1.0
            maxBackoff = 120_000.milliseconds
        }
    }

In a situation where there is a large snapshot that takes greater than the ~17m the retry strategy allows for it will fail; this is unexpected as a) the operation is still in progress (in our case, it completed successfully after ~30m) and b) waitUntilDBSnapshotAvailable isn't documented as having a timeout / doesn't offer any configurability on timeouts.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected behavior

Expecting that waiters either wait indefinitely (which is the implied behaviour, though not ideal) or take a default-value parameter with a retry strategy (or any alternate approach that makes it clear there are timeouts and how to configure them).

The AWS Java SDK allow waiters to be configurable: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/waiters.html

Current behavior

As described above RdsClient.waitUntilDBSnapshotAvailable (and likely other Waiters) times out unexpectedly (well, unexpected if you haven't read the code).

Steps to Reproduce

Execute a Waiter whose operation takes more than the hard-coded retry/timeout limit.

Possible Solution

Ideally the API would be more up-front and configurable for waiters to allow configurability on the timeouts and clearly document that it isn't solely waiting for the primary condition (there's also timeouts in play).

As a workaround have cobbled together the below to allow coarse-grained configurability:

    private suspend fun waitForSnapshot(snapshotId: String) {
        var retries = 0
        while (true) {
            try {
                rdsClient.waitUntilDBSnapshotAvailable(
                    DescribeDbSnapshotsRequest { this.dbSnapshotIdentifier = snapshotId }
                )
                return
            } catch (e: TooManyAttemptsException) {
                if (retries >= 3) {
                    throw e
                }
                retries++
            }
        }
    }

Context

Using the SDK to initiate an RDS snapshot, and wait for it to complete before continuing on with upgrading the database.

AWS SDK for Kotlin version

1.3.42

Platform (JVM/JS/Native)

JVM

Operating system and version

MacOS / Linux

@cloudshiftchris cloudshiftchris added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

No branches or pull requests

1 participant