Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the opportunity to not abort timed out get_slice, but instead run a race with retried attempt #5468

Open
fulmicoton opened this issue Oct 2, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@fulmicoton
Copy link
Contributor

fulmicoton commented Oct 2, 2024

In #5466 we introduced aggressive timeout/retries to improve latency.
We could avoid abort attempts on timeout and instead run a race with the new attempts.

As suggested by @trinity-1686a

/// Downloads a slice of a file from the storage, and returns an in memory buffer
    async fn get_slice(&self, path: &Path, range: Range<usize>) -> StorageResult<OwnedBytes> {
        let mut futures_unordered = FuturesUnordered::new();
        let num_bytes = range.len();

        for (attempt_id, timeout_duration) in self
            .storage_timeout_policy
            .compute_timeout(num_bytes)
            .enumerate()
        {
            let get_slice_fut = self.underlying.get_slice(path, range.clone());
            futures_unordered.push(get_slice_fut);
            match tokio::time::timeout(timeout_duration, futures_unordered.next()).await {
                Ok(Some(result)) => {
                    crate::STORAGE_METRICS
                        .get_slice_timeout_successes
                        .get(attempt_id)
                        .or(crate::STORAGE_METRICS.get_slice_timeout_successes.last())
                        .unwrap()
                        .inc();
                    return result;
                }
                Ok(None) => {
                    // ..
                }
                Err(_elapsed) => {
                    rate_limited_info!(limit_per_min=60, num_bytes=num_bytes, path=%path.display(), timeout_secs=timeout_duration.as_secs_f32(), "get timeout elapsed");
                    continue;
                }
            }
        }
        rate_limited_warn!(limit_per_min=60, num_bytes=num_bytes, path=%path.display(), "all get_slice attempts timeouted");
        crate::STORAGE_METRICS.get_slice_timeout_all_timeouts.inc();
        return Err(
            StorageErrorKind::Timeout.with_error(anyhow::anyhow!("internal timeout on get_slice"))
        );
    }
@fulmicoton fulmicoton added the enhancement New feature or request label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant