-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix application qps quota stalls. #14859
base: master
Are you sure you want to change the base?
Fix application qps quota stalls. #14859
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14859 +/- ##
============================================
+ Coverage 61.75% 63.74% +1.98%
- Complexity 207 1472 +1265
============================================
Files 2436 2709 +273
Lines 133233 151889 +18656
Branches 20636 23456 +2820
============================================
+ Hits 82274 96816 +14542
- Misses 44911 47806 +2895
- Partials 6048 7267 +1219
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Do we have any existing tests that already exercise this path? Since we are changing the code on critical path, I suggest adding tests (if not there) |
@@ -319,7 +319,7 @@ private void verifyQuotaUpdate(float quotaQps) { | |||
} catch (IOException e) { | |||
throw new RuntimeException(e); | |||
} | |||
}, 5000, "Failed to reflect query quota on rate limiter in 5s."); | |||
}, 10000, "Failed to reflect query quota on rate limiter in 5s."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not strictly necessary, so I'll revert it.
Quotas are tested mainly in HelixExternalViewBasedQueryQuotaManagerTest and QueryQuotaClusterIntegrationTest. I'll have a look at line coverage today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the general logic introduced in #14226 needs to be improved. All the quota updates should be done via the processApplicationQueryRateLimitingClusterConfigChange()
callback, and from query path it should call a real-only method which doesn't do any update logic.
@@ -74,6 +74,12 @@ | |||
* - broker added or removed from cluster | |||
*/ | |||
public class HelixExternalViewBasedQueryQuotaManager implements ClusterChangeHandler, QueryQuotaManager { | |||
|
|||
// Minimum 'working' value for app quota. If actual value is less than this (e.g. 0.0), it is considered as disabled. | |||
private static final double MIN_APP_QUOTA = Math.nextUp(0.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention here is to treat 0
as disabled. It is not very readable to have this minimum double, can we change the comparison (e.g. <
to <=
) sign instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the value and hid logic behind isDisabled(), isEnabled() methods.
@@ -130,9 +136,10 @@ private void initializeApplicationQpsQuotas() { | |||
|
|||
String appName = entry.getKey(); | |||
double appQpsQuota = | |||
entry.getValue() != null && entry.getValue() != -1.0d ? entry.getValue() : _defaultQpsQuotaForApplication; | |||
entry.getValue() != null && entry.getValue() >= MIN_APP_QUOTA ? entry.getValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, but we might want to allow overriding default quota to disable throttling
entry.getValue() != null && entry.getValue() >= MIN_APP_QUOTA ? entry.getValue() | |
entry.getValue() != null ? entry.getValue() : _defaultQpsQuotaForApplication; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
|
||
if (appQpsQuota < 0) { | ||
if (appQpsQuota < MIN_APP_QUOTA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for other places
if (appQpsQuota < MIN_APP_QUOTA) { | |
if (appQpsQuota <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
} | ||
|
||
// Caller method need not worry about getting lock on _applicationRateLimiterMap | ||
// as this method will do idempotent updates to the application rate limiters | ||
private synchronized void createOrUpdateApplicationRateLimiter(List<String> applicationNames) { | ||
private synchronized void createOrUpdateApplicationRateLimiter(List<String> applicationNames, double override) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making override a @Nullable
, and use null
to represent not override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more method to hide the override when it's not needed but kept it as primitive because null is not necessarily more readable and would trigger boxing/unboxing.
That method is applied only when default value changes. There's one exception to the the 'read-only-nees' and that is when an unknown app name is detected and default app quota is enabled. In such case we've to create rate-limiter on the spot, but without querying ZK. |
PR fixes #14852.
It removes slow & locking ZK queries from the hot path (query execution) and depends on background messaging to keep quotas in sync.
It changes application quota logic slightly so that non-positive quota values mean that quota is disabled and can be acquired anytime.
While checking the logic I also found that: