Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate http error_timeout based upon capacity option #145

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jpittis
Copy link
Contributor

@jpittis jpittis commented May 25, 2017

This PR proposes an intuitive way to configure :error_threshold for Semian HTTP configurations. The user configures a :capacity option as a percentage and the :error_threshold is calculated based upon a requests :read_timeout.

Reasoning

The following diagrams assumes the circuit starts open and the requested endpoint is not recovering. This means the worker will alternate between the open and half open state.

t=0        t=1         t=2        t=3         t=4        t=5
 |----------|-----------|----------|-----------|----------|
open       half        open       half        open       half

free       busy        free       busy        free       busy

Whenever the circuit is in an open state, the worker is able to do work for other resources. But when the worker is in a half open state, the worker cannot do other work because it's stuck hanging until the request times out.

We're calling this ratio of free to busy state the worker's :capacity.

The High School Math

For Semian HTTP requests we can calculate capacity based on this equation:

capacity = error_timeout / (error_timeout + request_timeout)

In words, capacity of a given worker is the amount of time that is not spent hanging on a single request.

Examples

  1. A :capacity of 0.5 would set the :error_timeout state to whatever the request timeout is.
  2. As :capacity approaches infinity, :error_timeout also approaches infinity.
  3. With a :capacity of 0.75 and a 60 second request timeout, the :error_timeout would be 180 seconds.

Isn't this capacity stuff meant to be handled by bulkheads?

  • This PR addresses the capacity of a lone worker and doesn't require shared state between workers.
  • Bulkheads require a semaphore per resource which is expensive when dealing with a large number of resources. (For example a large number of HTTP requests.)

Concerns

  • This idea should be verified by a number of trained experts in high school math.
  • Just because the idea makes sense does not mean it's worth adding to Semian.
  • Do we care that the default :read_timeout being 60 seconds will lead to values of :error_timeout greater than a minute when :capacity > 0.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant