Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent Reactor Pools #179

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

pderop
Copy link
Contributor

@pderop pderop commented Nov 26, 2023

This PR introduces a new feature for the Reactor Pool, aimed at preventing resource starvation in highly concurrent scenarios. Currently, when multiple threads simultaneously acquire and release resources, the SimpleDequeuePool's drainLoop method can monopolize a CPU core, causing a bottleneck where other threads may remain less utilized.

There are of course existing ways to prevent one thread being executing the drainLoop method for a too long time, one can for instance use an acquisition scheduler, in order to offload borrowers delivery using a configured scheduler, at the cost of using extra threads.

This PR experiments another way to offload the delivery of borrowers. Instead of using an acquisition scheduler, a concurrent InstrumentedPoolDecorator can be used. it allows to create a pool composed of multiple sub pools, each managing a portion of resources. Application Executors (i.e Netty Event Loops for example) can then be reused and assigned to each sub pools, where acquisitions tasks will be scheduled. No extra threads will be created. This design enables concurrent distribution of resource acquisitions across these sub-pools, using a work-stealing approach where a busy sub-pool can be helped by another free sub-pool, which can steal acquisition tasks from the busy sub-pool.

For instance, in Reactor Netty, each Netty Event Loop thread will have its sub-pool with its assigned HTTP/2 connection resources.

The work is still in progress (even I'm not sure about the API), some issues remain and this PR could be merged in a temporary branch, in order to make it easier to continue the work on it. I don't have created a branch for the moment, to be confirmed by the Reactor team.

Attached to this PR a JMH project which compares the different approaches. The WorkStealingPoolBenchmarksimulates a netty application which run hundreds of thousands of tasks running within an EventLoop Group. Each task will then acquire and release some resources. The benchmark needs this PR to be compiled and installed in local M2.

See attached
reactor-pool-jmh.tgz

In the WorkStealingPoolBenchmark class:

  • benchWithSimplePool: this method simulates activities that are running within a Netty EventLoop group. So tasks are scheduled in Event Loops, each one is then acquiring/releasing a PiCalculator resources. When a task has acquired a PICalculator, the PI number is computed, and the PiCalculator is then returned to the pool. When running this method, there will be actually one single running thread: it will be the one that is running the drainLoop method, which will spend it's life delivering "PiCalculator" resources to all borrowers from all event loop threads. Since no acquisition scheduler is used, the drainLoop is then consuming one core forever, and other tasks are just scheduling acquisition tasks to the pool. Check with Top, the process only consume 125% of total CPUs. The test runs in about 18 seconds on a Mac M1 with 10 cpus.

  • benchWithAcquisitionSchedulerEventLoop: this benchmark tries to avoid the starvation problem by using an acquisition scheduler, where all borrowers are then delivered using the EventLoopGroup of the simulated application. This significantly improves performance (see below the results: about 10.78 secs).

  • benchWithAcquisitionSchedulerFJP: this time, the common system ForkJoinPool is used to offload deliveries of borrowers. This improves even more performances, at the cost of using extra threads (in Reactor Netty, idealy, for example, we would like to avoid using extra threads, only the event loop threads ...): 5.11 sec.

  • finally, the benchWithConcurrentPools method is doing a benchmark with this PR: a "concurrent" InstrumentedPool, composed of multiple sub pools, each one assigned to each Event Loop Executors: 2.75 sec.

The results are the following (tested with JDK 21):
(lesser score is better)

Benchmark                                                         Mode  Cnt  Score   Error  Units
WorkStealingPoolBenchmark.benchWithAcquisitionSchedulerEventLoop  avgt       1.222           s/op
WorkStealingPoolBenchmark.benchWithAcquisitionSchedulerFJP        avgt       1.239           s/op
WorkStealingPoolBenchmark.benchWithConcurrentPools                avgt       0.894           s/op
WorkStealingPoolBenchmark.benchWithSimplePool                     avgt       6.612           s/op

Remaining issues:

  • The WorkStealingPool is not implementing any idle resource reuse LRU or MRU order configured via PoolBuilder.idleResourceReuseLruOrder or PoolBuilder.idleResourceReuseMruOrder. Only the sub-pools implement the configured idle resource reuse strategy. I actually wonder how this could be implemented globally on top of all sub pools.
  • Metrics are implemented naively, by iterating over all sub pools. Common metrics with some long adders should be shared between all sub pools !
  • The WorkStealingPool config() method is currently returning the config of the first sub-pool, this is questionable ... maybe a concurrent pool should be created using a PoolBuilder, so it can have its own config. Currently, the concurrent pool is created using InstrumentedPoolDecorators.concurrentPools factory method.

@pderop pderop added the type/enhancement A general enhancement label Nov 26, 2023
@pderop pderop self-assigned this Nov 26, 2023
@pderop pderop marked this pull request as draft November 26, 2023 20:22
@pderop pderop force-pushed the workstealing-pool branch from 2ef6a19 to 4b27763 Compare January 11, 2024 11:12
@pderop
Copy link
Contributor Author

pderop commented Jan 11, 2024

Updated the PR with the latest version, which is now based on an AtomicBitSet.
some other important use cases still need to be designed. for example, when a sub pool has all its idle resources evicted, we should check if other sub pools currently have some availavble idle resources before allocating any new resources ...

@pderop pderop force-pushed the workstealing-pool branch from 4b27763 to def1bc0 Compare January 17, 2024 11:21
@pderop
Copy link
Contributor Author

pderop commented Jan 17, 2024

Rebased the PR on top of latest 1.0.6-SNAPSHOT version.

@pderop pderop force-pushed the workstealing-pool branch from def1bc0 to 1ad9233 Compare May 15, 2024 10:09
@pderop
Copy link
Contributor Author

pderop commented May 15, 2024

Rebased with latest main branch.

@pderop pderop force-pushed the workstealing-pool branch from 1ad9233 to caf933f Compare June 6, 2024 17:24
@pderop
Copy link
Contributor Author

pderop commented Jun 6, 2024

Rebased with latest main branch.

@pderop
Copy link
Contributor Author

pderop commented Jul 12, 2024

maybe this PR will be closed at some point. in the meantime, rebased it on top of latest main.

@pderop pderop force-pushed the workstealing-pool branch from caf933f to 918487d Compare July 12, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant