Provide adaptive concurrency limits #8897

arouel · 2024-06-19T21:58:04Z

Why

There may be several reasons why adaptive concurrency limiting is preferred over using a fixed limit:

Dynamic System Conditions: In a distributed system, conditions such as load, resource availability, and topology can change frequently due to factors like auto-scaling, partial outages, code deployments, or fluctuations in traffic patterns. A fixed concurrency limit cannot adapt to these dynamic conditions, leading to either under-utilization of resources or overwhelmed services.
Latency Sensitivity: Different services or use cases may have varying sensitivity to latency. A fixed concurrency limit cannot account for these differences, potentially leading to either excessive queuing and high latency or under-utilization of resources. An adaptive approach can adjust the limit based on observed latencies, maintaining desired performance characteristics.
Simplicity and Autonomy: Manually determining and configuring fixed concurrency limits for every service or instance can be a complex and error-prone process, especially in large-scale distributed systems. An adaptive approach can autonomously and continuously adjust the limit without manual intervention, simplifying operations and reducing the risk of misconfiguration.
Resilience and Self-Healing: By automatically adjusting the concurrency limit based on observed conditions, an adaptive approach promotes resilience and self-healing capabilities. It allows services to shed excessive load during periods of high demand or resource constraints, preventing cascading failures and promoting graceful degradation.

While a fixed concurrency limit may be easier to reason about and configure initially, it lacks the flexibility and adaptability required in modern, dynamic distributed systems. An adaptive approach provides the ability to continuously optimize performance, resource utilization, and resilience in the face of changing conditions, ultimately leading to a more robust and efficient system.

Suggestion

Ideally, a user would be able to describe the limiting algorithm in the ListenerConfig that fit their needs instead of a fixed number for maxConcurrentRequests. The Limit and Limiter interfaces from Netflix's concurrency limits library are a good starting point. In the first iteration we should provide the following implementations

no limit,
fixed limit,
AMID limit.

Instead of passing a Semaphore for requests in the ServerListener to the ConnectionHandler we would pass a Limiter implementation that holds the configured Limit algorithm. The Limiter would be used instead if the Semaphore to acquire a token per request. If no token can be acquired the limit is exceeded and the request can be rejected.

While implementing a Proof of Concept (PoC), I asked myself where do we want to place the Limiting API. I guess, we need a new submodule concurrency-limits which holds Limit and Limiter interfaces and a standard set of implementations. The webserver module then depends on concurrency-limits.

Another question is, how do we want to make the various limiting algorithm configurable. Today, we have just the single property maxConcurrentRequests, but in future we want to choose from a set of different implementations, e.g. no limit, fixed limit, AMID limit, Vegas limit etc.

When testing the PoC, I noticed that when the access log feature is activated, rejected requests are not logged in the access log file. Is this behavior intentional or is this a bug?

Additionally, extending the metrics (looking at KeyPerformanceIndicatorMetricsImpls) would be helpful, to be able to observe how a service is doing. I'm thinking here about the following request limiting metrics:

success counter
ignored counter
rejected counter

The text was updated successfully, but these errors were encountered:

romain-grecourt · 2024-06-21T16:21:15Z

@spericas @tomas-langer @danielkec FYI.

tomas-langer · 2024-06-28T10:14:18Z

Hello,
this sounds like a great idea. I will provide a few answer for questions you posted:

location of the module you called concurrency-limits: it seems to me that this should be agnostic of the usage, so it would fit under our common module (i.e. common/concurrency-limits). I think the implementations can be part of the same module (unless one of them would require some additional libraries). Please follow our flat package structure (only spi package is allowed for provider interfaces).
Access Log and rejected requests: you are right, these are currently not logged, as access log is implemented as a filter, and to reach it, we must accept the request. The intention of the limit is not to use resources on the server (i.e. to prevent issues from attacks). I think we may be missing the data in these cases - maybe we could use metrics for this - if a counter is increased, it does not really use additional resource)
Metrics - we currently do not depend on the metrics API in webserver; we could do it (or the rate limiter implementations can do it), just be aware that request handling is a critical performance hotspot, and metrics may have an impact when there is a metric implementation in place
Configuration - our blueprints have support for providers, which are loaded using service loader (and will support our service registry as well). This allows for nested configuration that is not fixed, but defined by the limiter implementation. The config could be something like

server:
  limiter: # limiter() method in the blueprint
    fixed: # identification of the service type
      limit: 444

Some other thoughts:

this is quite a critical part of Helidon WebServer, so any change in this area would most likely require a few cycles of reviews/design decisions. We also require adherence to Helidon code style, configuration style etc.
if you have a POC, we can maybe have a look at that

LateshDulani · 2024-08-15T16:02:24Z

+1 for this change.
After helidon4 upgrade we are facing similar issue under load.
Issue is that if we keep max concurrent requests to unlimited then under high sustained load server fails to process requests because server tries to process more requests then it can and requests couldn’t get db connections and eventually too many requests fails. To mitigate this issue we limited concurrent requests limit to X-5 because we have X db connections max in pool. This resolved issue with sustained load but it introduced another problem that if there is a sudden burst of requests we can process X number of request and other request fails with 503.
Ideally we would want to limit concurrent request processing based on our resource limitation (db connections, memory etc) but do not want additional requests to fail (at least not right away) in case of burst.

vasanth-bhat · 2024-08-30T01:45:11Z

For the Fixed limit scenario ( max-concurrent-requests ), can there be an option to also enable queueing with a configurable queue size? The default behavior can still be the same , on a need basis the services can enable queueing with limits on queue size whenever fixed limit is used. That would avoid requests failing with 503 when here is occasional burst. This will make sure that the larger behavior is compatible with H3, H2 where requests got queued , while waiting for threads to become available in Server threadpool

While services can do this with BulkHead API, it would be good have some support in Helidon itself, which may work for many services.

created. - #9229 for providing option to enable queueing when "max-concurrent-requests" is configured . This can be near term solution to avoid requests failing during surge

lettrung99 · 2024-09-04T15:09:06Z

For the Fixed limit scenario ( max-concurrent-requests ), can there be an option to also enable queueing with a configurable queue size? The default behavior can still be the same , on a need basis the services can enable queueing with limits on queue size whenever fixed limit is used. That would avoid requests failing with 503 when here is occasional burst. This will make sure that the larger behavior is compatible with H3, H2 where requests got queued , while waiting for threads to become available in Server threadpool

While services can do this with BulkHead API, it would be good have some support in Helidon itself, which may work for many services.

created. - #9229 for providing option to enable queueing when "max-concurrent-requests" is configured . This can be near term solution to avoid requests failing during surge

Plus 1, can we get some support for short term solution on #9229?

barchetta · 2024-09-04T21:50:19Z

Just dropping this here for reference:

The Fault Tolerance Bulkhead feature (SE, MP) provides a mechanism for (non-adaptive) rate-limiting access to specific tasks. You control both parallelism and wait-queue length.

See the Helidon SE Rate Limiting example for examples of using a Bulkhead as well as a Java Semaphore for doing rate limiting.

lettrung99 · 2024-09-16T17:36:01Z

Hello,

I am also interested in the status of this Jira ticket. Could you please provide an estimated eta for the implementation of this fix?

Thank you for your assistance.

David

arouel · 2024-09-23T19:10:13Z

Just FYI, I made a proof of concept of adaptive concurrency limits for Helidon 4 (see https://github.com/arouel/helidon-contrib). Maybe this is helpful to you. Any feedback is welcome.

tomas-langer · 2024-09-27T14:37:46Z

I have created a PR based on @arouel proposal, refactored a bit to use approach aligned with Helidon Fault Tolerance.
See #9295 for details - both on how it would be configured and how it is implemented.
Please provide feedback!

barchetta added 4.x Version 4.x enhancement New feature or request webserver labels Jun 20, 2024

m0mus added the Monday label Jun 21, 2024

arouel mentioned this issue Jun 26, 2024

Get annotation processing going with Eclipse Compiler for Java #8913

Merged

tomas-langer self-assigned this Jun 27, 2024

m0mus removed the Monday label Jul 1, 2024

m0mus added this to Backlog Aug 12, 2024

m0mus moved this to In Progress in Backlog Aug 12, 2024

tomas-langer moved this from In Progress to Triage in Backlog Sep 10, 2024

m0mus added the P2 label Sep 12, 2024

m0mus moved this from Triage to High priority in Backlog Sep 12, 2024

tomas-langer moved this from High priority to Sprint Scope in Backlog Sep 19, 2024

romain-grecourt mentioned this issue Sep 24, 2024

Provide configurable option to queue requests when concurrency is limited with "max-concurrent-requests" #9229

Closed

tomas-langer mentioned this issue Sep 27, 2024

4.x: Concurrency limits module, and support in Helidon WebServer #9295

Merged

tomas-langer mentioned this issue Sep 30, 2024

4.x: Add metrics for concurrency limits and fault tolerance #9304

Open

tomas-langer closed this as completed in #9295 Oct 14, 2024

github-project-automation bot moved this from Sprint Scope to Closed in Backlog Oct 14, 2024

barchetta mentioned this issue Oct 14, 2024

4.1.3: Concurrency limits module, and support in Helidon WebServer #9384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide adaptive concurrency limits #8897

Provide adaptive concurrency limits #8897

arouel commented Jun 19, 2024 •

edited

Loading

romain-grecourt commented Jun 21, 2024

tomas-langer commented Jun 28, 2024

LateshDulani commented Aug 15, 2024

vasanth-bhat commented Aug 30, 2024 •

edited

Loading

lettrung99 commented Sep 4, 2024

barchetta commented Sep 4, 2024

lettrung99 commented Sep 16, 2024

arouel commented Sep 23, 2024 •

edited

Loading

tomas-langer commented Sep 27, 2024

Provide adaptive concurrency limits #8897

Provide adaptive concurrency limits #8897

Comments

arouel commented Jun 19, 2024 • edited Loading

Why

Suggestion

romain-grecourt commented Jun 21, 2024

tomas-langer commented Jun 28, 2024

LateshDulani commented Aug 15, 2024

vasanth-bhat commented Aug 30, 2024 • edited Loading

lettrung99 commented Sep 4, 2024

barchetta commented Sep 4, 2024

lettrung99 commented Sep 16, 2024

arouel commented Sep 23, 2024 • edited Loading

tomas-langer commented Sep 27, 2024

arouel commented Jun 19, 2024 •

edited

Loading

vasanth-bhat commented Aug 30, 2024 •

edited

Loading

arouel commented Sep 23, 2024 •

edited

Loading