-
-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically configure SemaphoreBackPressureHandler with BackPressureLimiter (#1251) #1308
base: main
Are you sure you want to change the base?
Dynamically configure SemaphoreBackPressureHandler with BackPressureLimiter (#1251) #1308
Conversation
@loicrouchon If i read this correctly and i would need to build #481 with for example process 5 messages every second. I would need to implement the BackPressureLimiter, with a logic to set/increase the limit to/with 5 for every second? |
@jeroenvandevelde, the goal of the However, it does not consider how many messages are currently being processed by the queue consumer, nor does it check again the current So what you would need to do to implement a rate-limiter is a 2 step approach.
In case the rate-limit is not a hard limit (i.e. meaning you can temporarily have short bursts over the limit), then 1. is not necessary. So rate-limiting can somehow be performed, but it requires extra care if there are hard constraints regarding the rate limit. The solution implemented in this PR primarily aims at use-cases where the pressure is measured from an external system and in which case there is by design a delay between the measure of the pressure on the downstream system and reacting to it. A good example would be when pushing messages to another system (could be another queue for example). In this case, the For example, in the case the downstream system is an SQS queue:
PS: the advantage of the example pacing messages publication to a downstream queue might not be obvious, but there are two:
|
@loicrouchon I missed the fact than that if it reads a message it doesn't subtract it from the limit. In my case where i would like to have a max rate of x messages / second, your approach seems a bit weird as all the information for this is available in the library. How fast are we processing and the rate at which we would like to go (configurable value). I can follow if it is based on parameters outside of the library, which your case sounds like. |
@jeroenvandevelde I think this information is only fully available in a very limited number of use cases. Most of the queue consumer implementations I had to implement in the past were doing a variable number of API calls. It usually depended on the content of the message being consumed, some persistent state, or the result of previous API calls. So limiting consumption from the queue at a fix rate-per-second is not accounting for cases where the rate-limited API we're trying to protect would be called a variable number of times (0 times or more). That is why in the solution presented here the I totally agree with you, it is more complicated than the use case you described. But it would avoid the issue of blocking the queue consumption for messages that did not triggered the API call that should be rate-limited.
Please do so, and let me know about your findings. |
If these situations are very limited or are a normal use case will depend a lot on your context. We have designed our system, so that every HTTP call to an (external) endpoint has a queue in front to handle the overflow. This indeed doesn't cover the situation where system a is not going at full speed and therefore system b could go faster against the same endpoint. |
Hi @loicrouchon and @jeroenvandevelde. @loicrouchon, overall this looks very promising - thanks! I wonder why you chose to change the current Also, would it make sense maybe to have a list of It also seems like the changes might fix #1187, which is an outstanding issue, so that's great. Sorry I couldn't discuss this earlier with you in the issue, and thanks again to you both for bringing these enhancements up! |
Hi @tomazfernandes and Happy New Year!
I thought about doing so, but gave up because of the release method that was somehow complicated to wrap. But that was at the beginning of my attempts. I now have a much better understanding of the I'll keep you posted about my progress. |
…BackPressureHandler (awspring#1251)
@tomazfernandes I pushed a version using a wrapper over the SemaphoreBackPressureHandler. I'm quite happy with how it simplify things now and I'm looking for feedback before continuing with the PR (maybe renaming a few things for improving clarity and updating reference documentation)
I'm not sure, I think it can get tricky very quickly when it comes to the However, if you would like to limit the number of permits via different
Regarding this, I'm not 100% sure it would fix it. So I would need to look more into it. |
Thanks for the update @loicrouchon, I think we're definitely moving in the right direction here.
I'll share what's in my mind for your consideration - I think it's similar to what you have but with a slightly different approach that might simplify things a bit while bringing more flexibility. I might be missing something though. I think each So, we could have a The batch methods themselves from On a We'd then call The benefit I see for this approach is that we can keep each We could also in the future separate Low / High Throughput logic to its own Example ScenarioI'll illustrate with an example just as a sanity check: Let's say we have 3 BPH implementations in the
On the first poll, On the second poll, Let's say at some point the downstream API is holding requests with a 10 second poorly configured timeout. The As the consumers release permits, I don't think this is too different from what you are proposing, in that we're effectively limiting the amount of permits returned, and the logic for Of course, there's a lot of complexity involved and I might be missing something. Please let me know your thoughts. Thanks. |
I agree with your reasoning, only two points of feedback.
|
@tomazfernandes , this is a way bigger scope than I initially intended, but I do see the value for it. I'll work on something along those lines.
I think this is not a problem (there is a For example, let's say the RateLimiterBPH says 20, but the SemaphoreBPH says 5. Few moments later, new round (but still in the same rate-limiting time window), the RateLimiterBPH should say 15 and not 0 as only 5 of the 20 were used. Hence I believe for the RateLimiterBPH, the release method should be implemented.
I considered this approach. This could in theory allow to remove all semaphore logic from the BPH and only keep it in the composite one. But what I struggle with is the current high/low throughput mode of the SemaphoreBPH as this one has a logic in high throughput, to acquire all available permits in case it timed out when trying to acquire the full amount requested. On top of that, the SemaphoreBPH and RateLimiterBPH would still need a kind a release method to keep track of how many permits were "not consumed" and are still potentially usable for the next round. With that being said, I think at the current moment implementing a composite BPH containing individual BPH is the best approach. The main downside I see is that it might for wait too long in case it waits for more permits than necessary due to other BPH reducing this number. But to fix this, I can only think of doing a 2 step approach. First figure out how much can be requested, then call the request methods with the lowest number. Not sure we want to do that. @tomazfernandes I have a question? If we introduce the CompositeBPH, do we need to keep the BatchAwareBackPressureHandler interface? I see it is only used in the AbstractPollingMessageSource, so we could implement this batching logic at the CompositeBPH level and which would translate to request(amount)/release(amount) to all other BPH. This could simplify their implementation, no? |
More questions around the BatchAwareBackPressureHandler. The behavior of the SemaphoreBPH#requestBatch method is not equivalent to calling request(batchSize). This means in order to keep the current behavior of the SemaphoreBPH when wrapped in the CompositeBPH, the CompositeBPH need to map requestBatch calls to each contained BPH requestBatch calls. But if one of the BPH limits to something smaller than the batch size, then we have a problem. There are two cases:
We could maybe fix this by requiring that request/releaseBatch should be equivalent with request/release(batchSize). This can be done for individual BPH applying the batch behavior in request(amount)/release(amount) methods if the amount is equal to the batchSize. |
Hey @loicrouchon and @jeroenvandevelde, lots of interesting points.
I think we still need the interface. What I'm thinking is - we can implement What this means is - if knowing whether we're calling the batch or a smaller set is relevant to the implementation, we should use the We might look into adding a
Yup, that's what I'm thinking.
Yeah, there's a lot of subtleties in this logic. Perhaps we could do something similar to what's done in Lines 254 to 265 in 425b6c8
Yeah, I thought of this, but I'm not sure how much more complexity this would introduce. Perhaps always requesting a batch would be simpler.
Yeah, I think this is an open question. Overall, I think if we can't release unused permits from the Introducing a Overall IMO unless we see a more compelling reason to introduce this 2-stage acquiring process we might want to go with the simpler 1-stage one.
Yeah, the point here is that we should always try to fetch the whole batch since it's more efficient, so if we need to wait e.g. 1 second for it that's ok. Overall, I understand there are some behaviors in
I think we should be careful here. Ideally, we should not need to change anything in the current I think we have a few open questions to consider:
IMO so far we should keep it simple - not introduce the |
Hi again @tomazfernandes, Few observations I made while digging into it today (please let me know if I got one wrong):
Looking at the
So far my attempts to make it work are failing and I'm 90% sure at this step this is coming from there.
I'm trying to do without this at the moment, I think it can work.
I'm trying with the downgrade at the moment, but then I faced the issue that the I'm mostly experimenting at this stage and not committing to this. Yet even if we do not go for the downgrade, I think it's an important question to address. If the At the moment, I feel, we could get away by:
Let me know what you think. I'll resume digging into this next week. |
Hey @loicrouchon, excellent analysis, thanks!
Yeah, for 1 it gets trickier. For the current logic I don't think it makes a difference - if it's not returning a whole batch, it should not switch to low throughput whether it's unused or message processed. And in case batch size is equal to 1 the TP mode shouldn't matter anyway. But for other use-cases I agree it might.
This logic changed a few times as it evolved. The first version had only a The Batch interface was introduced to try to isolate responsibilities - the This kind of leaked when I had to add the I don't have any strong opinions on some of the open questions, but here are some other thoughts for us to consider. CompositeBPHI think Ideally, we should be able to keep Release Unused MethodMy initial idea with the For instance, it could also have a We could have I'm not sure what's the best way to introduce these methods. We might want to introduce a new interface extending OTOH, I think the What I think we should keep in mind is - since this is a rather complex and sensitive part of the integration, ideally we should be able to make changes incrementally, keeping this PR as simple as possible, and following up with changes as required. Otherwise we might end up needing to make these changes as part of a Milestone version and it'd take longer to release it. Let me know your thoughts, thanks! |
One more thought. Permits DowngradeI think it's a good idea to try to include this. One use-case I can think of is - let's say we have a rate limiter and the If the rate limiter limits our permits to 5, it doesn't make sense for the The only thing for us to consider is if it would be simpler to first work only with batches and introduce this feature in a separate PR as an enhancement, but no strong opinions on this. |
…BackPressureHandlers (awspring#1251)
bb998d4
to
432d490
Compare
Hey @tomazfernandes, thanks for your inputs. I was able to resume and accommodate most of your comments (I think). I managed to implement the
Without those distinctions, the throughput mode change was done in a way that was not compatible with the integration tests. All of this was fixed by remove the batch notion for releases and replacing it with a Here are the changes compared to the previous state.
It's still experimental (I did not invest much time into the configuration of it in the I'll be waiting for your feedback. |
Hi @loicrouchon, sorry, I still haven't had time to look too deeply into the new implementation, but I have a few comments to keep things moving:
Please let me know your thoughts, thanks! |
Hey @tomazfernandes, thanks for the feedback
You're totally right, I overlooked this aspect. I'll add the methods back to the interface(s) and mark them as deprecated.
I thought about the use cases for such a system (without a reducible semaphore in the For these reasons I thought the Let me know if this makes sense or if you think such an implementation is too specific and should be provided by user code. If to be provided by user code, how to handle integration testing of the
I started with the
Because of the introduction of the
But then, I realized the nothing fetched is an important notion for switching the Then again, I realized that releasing permits because of being limited by another BPH is different from a partial fetch and this could be a useful piece of information. At this point, I decided against introducing one more method ( Now regarding the Regarding backward compatibility, we could have: @Deprecated // will not be called
default void BPH#release(int amount) {
release(amount, ReleaseReason.PROCESSED); // or empty if you prefer
}
@Deprecated // will not be called
default void BABPH#releaseBatch() {
release(getBatchSize(), ReleaseReason.NONE_FETCHED); // or empty if you prefer
}
You mean one BPH for high throughput and another for low throughput? I’m struggling to see a solution that would not break the existing behavior provided by the Let me know what you think with those additional details, thanks! |
📢 Type of change
📜 Description
This change enhances the
SemaphoreBackPressureHandler
with the support of a newBackPressureLimiter
interface.This
BackPressureLimiter
interface is to be implemented by applications. It has a single methodint limit()
that returns a number of permits that can be consumed by theSemaphoreBackPressureHandler
.Before each polling, the limit will be checked by the
SemaphoreBackPressureHandler
and adjust the number of permits that can be requested (in the range[0, totalPermits]
). The limit returned by theBackPressureLimiter#limit()
method is to be understood in terms of number of messages that can be consumed from the queue at the current instant. If0
(or less) the queue consumption is considered to be on standby.When a polling is performed and the consumption is on standby, the
SemaphoreBackPressureHandler
will sleep thestandbyLimitPollingInterval
before allowing for a next polling attempt (we cannot rely on the semaphore acquire timeouts here, hence the need forstandbyLimitPollingInterval
.Both the
BackPressureLimiter
andstandbyLimitPollingInterval
can be configured via theContainerOptions
💡 Motivation and Context
The goal of this change is to address #1251.
#1251 aims to provide a more general solution to issues like #481 by giving control to users on how they would like to dynamically limit messages consumption from an SQS queue. Typical use cases could be rate-limiters (like #481) or more complicated setups involving measuring the load of a downstream system and adjusting or stopping the messages consumption
💚 How did you test it?
The testing was so far only tested via integration tests which tested various scenarios:
📝 Checklist
🔮 Next steps