-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize pool warmup #171
Optimize pool warmup #171
Conversation
…, if any are configured.
@violetagg , can you take a look ? thanks. |
@OlegDokuka PTAL |
There was still something wrong, I just did a commit, now I think the PR is finalized and ready to be reviewed. |
…is can be done in a separate PR. - resources allocated during warmup are subscribed to eagerly, but now you can configure the level of concurrency using `PoolBuilder.warmupConcurrency(int concurrency)` - improved the warpmupTest
updated the PR with the following:
|
…lBuilder.parallelizeWarmup(boolean). Warmups are not parallel (as before). Reverted CommonPoolTest.recordsAllocationLatenciesInWarmup test, because warmup is like before, by default: no concurrency. Improved PoolWarmupTest.
In the last commit:
|
…ween(min, max, warmupParallelism).
Did a new attempt: this time, let's not introduce a new
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some comments
…ggestion in drainloop.
@OlegDokuka , thanks for the review. |
…le (#181) This fixes a regression that comes from #171, from 1.0.1 version: since the pool warmup method allows to be called at anytime, then if no more permits are available from the allocation strategy, the Flux.merge method is then called with a mergeConcurrencypameter that is set to 0, causing the following exception: maxConcurrency > 0 required but it was 0 java.lang.IllegalArgumentException: maxConcurrency > 0 required but it was 0 at reactor.core.publisher.FluxFlatMap.<init>(FluxFlatMap.java:74) at reactor.core.publisher.Flux.merge(Flux.java:1406) at reactor.core.publisher.Flux.merge(Flux.java:1375) Added a check in warmup to immediately return in case no permits are available. Also added a test case. Fixes #180
Motivations:
From reactor/reactor-netty#2781 issue, there is a performance problem when
using r2dbc drivers configured with spring.r2dbc.pool.initial-size > 1.
For example, with
spring.r2dbc.pool.initial-size=4
, andspring.r2dbc.pool.max-size=10
, the first four db connections acquired during warmup will actually use the same TcpResource event loop thread.Worsed situation is when initial-size == max-size, in this case all db connections will actually use the same TcpResource event loop thread.
This is related to Reactor Netty TcpResources, which are by default colocated.
While it's possible to create custom LoopResources in Reactor Netty without colocation, the #2781 issue requests for the ability to reuse the shared TcpResource EventLoopGroup without colocation instead of creating a custom LoopResource without coloc for each drivers, like Mardiadb, Postgres.
whilst this change request is making sense, I tend to think that before doing this in reactor-netty, it might be worth to address the problem in reactor pool: indeed the primary root cause of the problem is that during pool warmups (either when calling
pool.warmup()
, or whenpool.initial-size is > 1
), the reactor-pool uses some Reactor operators likeFlux.concat
which sequentially subscribe to the first source then wait for it to complete before subscribing to the next. This leads to a situation where during warmup, a first DB connection is created from a random TcpResource EventLoop thread, and once this first DB Connection completes (from the TcpResource EventLoop thread), then the pool will then acquire another DB Connection ... but from the TcpResource thread ; so in this case, colocation will then reuse the same TcpResource event loop thread for all remaining DB connections to be acquired during warmup process.Modifications:
In SimpleDequeuePool.warmup(), at the end of the method, replace
Flux.concat
byFlux.merge
, because unlike Flux.concat, Flux.merge eagerly subscribes to all sources from the current thread, and this will then ensure that we won't allocate a DB connection from one of the TcpResource event loops, so this is good because colocation will not happenin SimpleDequeuePool.drainloop(), at the end of the method, replace:
by:
Same thing here: Flux.mergeSequential is used in order to eagerly subscribe to the primary, as well as to all warmupFlux from the current thread. It will avoid any subscriptions done from within an TcpResource eventloop thread, hence will avoid any colocation to take place. Note that mergeSequential is used here because we want to ensure that it will be the primary that will complete first.
Note that the
poolable -> drain()
has been added in the subscribe: this allows to fix a performance issue when running the sample from r2dbc/r2dbc-pool#190 (comment): indeed, even when using a custom LoopResource without colocation, only two event loop threads will actually be used. Explanation: before, theprimary
was first delivering a connection to one of the borrowers , and then only one "drain" method was called after the warmupFlux completed. So, only two borrowers delivered.So, if Mariadb driver is configure with an acquisition scheduler, this will ensure that pool acquisition is never done from any TcpResource event loop thread, hence this will enforce avoidance of colocation.
PoolWarmupTest
test, which reproduces the same issued described in reactor-netty #2781 (see UseSchedulers.single()
to avoid accidental thread co-location r2dbc/r2dbc-pool#190 (comment)).When running the test, you will see all DB Connections thread are used:
Notice that if you remove `poolable -> drain() in the first parameter of the subscribe, at the end of the SimpleDequeuePool.drainloop() method like this:
then only two threads will actually be used when you run the PoolWarmupTest:
This test was expected only two total allocations, because the second allocation was failing. But now, all resources allocation are attempted, so slightly adapted the test in order to expect at least 6 allocations (5 successful acquisitions, and at least one failed acquisition).
while doing the PR, I got some concurrency issues related to the TestUtils.InMemoryPoolMetrics, and I think the methods need to be synchronized.
Results:
So, using this patch, I could verify that all threads are used when running the scenario from r2dbc/r2dbc-pool#190 (comment).
Did also some Gatling tests with a webflux reactor-netty that is retrieving 1000 entries from a locally installed Mariadb database:
here are the results with spring.r2dbc.pool.initial-size=10, spring.r2dbc.pool.max-size=10
results are very bad (only the reactor-tcp-kqueue-2 is used):
performances are good, and all "custom" threads are used: