-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide adaptive concurrency limits #8897
Comments
Hello,
server:
limiter: # limiter() method in the blueprint
fixed: # identification of the service type
limit: 444 Some other thoughts:
|
+1 for this change. |
For the Fixed limit scenario ( max-concurrent-requests ), can there be an option to also enable queueing with a configurable queue size? The default behavior can still be the same , on a need basis the services can enable queueing with limits on queue size whenever fixed limit is used. That would avoid requests failing with 503 when here is occasional burst. This will make sure that the larger behavior is compatible with H3, H2 where requests got queued , while waiting for threads to become available in Server threadpool While services can do this with BulkHead API, it would be good have some support in Helidon itself, which may work for many services. created. - #9229 for providing option to enable queueing when "max-concurrent-requests" is configured . This can be near term solution to avoid requests failing during surge |
Plus 1, can we get some support for short term solution on #9229? |
Just dropping this here for reference: The Fault Tolerance Bulkhead feature (SE, MP) provides a mechanism for (non-adaptive) rate-limiting access to specific tasks. You control both parallelism and wait-queue length. See the Helidon SE Rate Limiting example for examples of using a Bulkhead as well as a Java Semaphore for doing rate limiting. |
Hello, I am also interested in the status of this Jira ticket. Could you please provide an estimated eta for the implementation of this fix? Thank you for your assistance.
|
Just FYI, I made a proof of concept of adaptive concurrency limits for Helidon 4 (see https://github.com/arouel/helidon-contrib). Maybe this is helpful to you. Any feedback is welcome. |
Why
There may be several reasons why adaptive concurrency limiting is preferred over using a fixed limit:
Dynamic System Conditions: In a distributed system, conditions such as load, resource availability, and topology can change frequently due to factors like auto-scaling, partial outages, code deployments, or fluctuations in traffic patterns. A fixed concurrency limit cannot adapt to these dynamic conditions, leading to either under-utilization of resources or overwhelmed services.
Latency Sensitivity: Different services or use cases may have varying sensitivity to latency. A fixed concurrency limit cannot account for these differences, potentially leading to either excessive queuing and high latency or under-utilization of resources. An adaptive approach can adjust the limit based on observed latencies, maintaining desired performance characteristics.
Simplicity and Autonomy: Manually determining and configuring fixed concurrency limits for every service or instance can be a complex and error-prone process, especially in large-scale distributed systems. An adaptive approach can autonomously and continuously adjust the limit without manual intervention, simplifying operations and reducing the risk of misconfiguration.
Resilience and Self-Healing: By automatically adjusting the concurrency limit based on observed conditions, an adaptive approach promotes resilience and self-healing capabilities. It allows services to shed excessive load during periods of high demand or resource constraints, preventing cascading failures and promoting graceful degradation.
While a fixed concurrency limit may be easier to reason about and configure initially, it lacks the flexibility and adaptability required in modern, dynamic distributed systems. An adaptive approach provides the ability to continuously optimize performance, resource utilization, and resilience in the face of changing conditions, ultimately leading to a more robust and efficient system.
Suggestion
Ideally, a user would be able to describe the limiting algorithm in the ListenerConfig that fit their needs instead of a fixed number for
maxConcurrentRequests
. TheLimit
andLimiter
interfaces from Netflix's concurrency limits library are a good starting point. In the first iteration we should provide the following implementationsInstead of passing a
Semaphore
for requests in theServerListener
to theConnectionHandler
we would pass aLimiter
implementation that holds the configuredLimit
algorithm. TheLimiter
would be used instead if theSemaphore
to acquire a token per request. If no token can be acquired the limit is exceeded and the request can be rejected.While implementing a Proof of Concept (PoC), I asked myself where do we want to place the Limiting API. I guess, we need a new submodule
concurrency-limits
which holdsLimit
andLimiter
interfaces and a standard set of implementations. Thewebserver
module then depends onconcurrency-limits
.Another question is, how do we want to make the various limiting algorithm configurable. Today, we have just the single property
maxConcurrentRequests
, but in future we want to choose from a set of different implementations, e.g. no limit, fixed limit, AMID limit, Vegas limit etc.When testing the PoC, I noticed that when the access log feature is activated, rejected requests are not logged in the access log file. Is this behavior intentional or is this a bug?
Additionally, extending the metrics (looking at
KeyPerformanceIndicatorMetricsImpls
) would be helpful, to be able to observe how a service is doing. I'm thinking here about the following request limiting metrics:The text was updated successfully, but these errors were encountered: