-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Reactor Pools #179
base: main
Are you sure you want to change the base?
Conversation
2ef6a19
to
4b27763
Compare
Updated the PR with the latest version, which is now based on an AtomicBitSet. |
4b27763
to
def1bc0
Compare
Rebased the PR on top of latest 1.0.6-SNAPSHOT version. |
Rebased with latest main branch. |
Rebased with latest main branch. |
maybe this PR will be closed at some point. in the meantime, rebased it on top of latest main. |
caf933f
to
918487d
Compare
This PR introduces a new feature for the Reactor Pool, aimed at preventing resource starvation in highly concurrent scenarios. Currently, when multiple threads simultaneously acquire and release resources, the SimpleDequeuePool's
drainLoop
method can monopolize a CPU core, causing a bottleneck where other threads may remain less utilized.There are of course existing ways to prevent one thread being executing the drainLoop method for a too long time, one can for instance use an
acquisition scheduler
, in order to offload borrowers delivery using a configured scheduler, at the cost of using extra threads.This PR experiments another way to offload the delivery of borrowers. Instead of using an acquisition scheduler, a concurrent
InstrumentedPoolDecorator
can be used. it allows to create a pool composed of multiple sub pools, each managing a portion of resources. Application Executors (i.e Netty Event Loops for example) can then be reused and assigned to each sub pools, where acquisitions tasks will be scheduled. No extra threads will be created. This design enables concurrent distribution of resource acquisitions across these sub-pools, using a work-stealing approach where a busy sub-pool can be helped by another free sub-pool, which can steal acquisition tasks from the busy sub-pool.For instance, in Reactor Netty, each Netty Event Loop thread will have its sub-pool with its assigned HTTP/2 connection resources.
The work is still in progress (even I'm not sure about the API), some issues remain and this PR could be merged in a temporary branch, in order to make it easier to continue the work on it. I don't have created a branch for the moment, to be confirmed by the Reactor team.
Attached to this PR a JMH project which compares the different approaches. The
WorkStealingPoolBenchmark
simulates a netty application which run hundreds of thousands of tasks running within an EventLoop Group. Each task will then acquire and release some resources. The benchmark needs this PR to be compiled and installed in local M2.See attached
reactor-pool-jmh.tgz
In the WorkStealingPoolBenchmark class:
benchWithSimplePool: this method simulates activities that are running within a Netty EventLoop group. So tasks are scheduled in Event Loops, each one is then acquiring/releasing a
PiCalculator
resources. When a task has acquired a PICalculator, the PI number is computed, and the PiCalculator is then returned to the pool. When running this method, there will be actually one single running thread: it will be the one that is running the drainLoop method, which will spend it's life delivering "PiCalculator" resources to all borrowers from all event loop threads. Since no acquisition scheduler is used, the drainLoop is then consuming one core forever, and other tasks are just scheduling acquisition tasks to the pool. Check with Top, the process only consume 125% of total CPUs. The test runs in about 18 seconds on a Mac M1 with 10 cpus.benchWithAcquisitionSchedulerEventLoop: this benchmark tries to avoid the starvation problem by using an acquisition scheduler, where all borrowers are then delivered using the EventLoopGroup of the simulated application. This significantly improves performance (see below the results: about 10.78 secs).
benchWithAcquisitionSchedulerFJP: this time, the common system ForkJoinPool is used to offload deliveries of borrowers. This improves even more performances, at the cost of using extra threads (in Reactor Netty, idealy, for example, we would like to avoid using extra threads, only the event loop threads ...): 5.11 sec.
finally, the benchWithConcurrentPools method is doing a benchmark with this PR: a "concurrent" InstrumentedPool, composed of multiple sub pools, each one assigned to each Event Loop Executors: 2.75 sec.
The results are the following (tested with JDK 21):
(lesser score is better)
Remaining issues:
InstrumentedPoolDecorators.concurrentPools
factory method.