-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible pool resource release bug #124
Comments
just in case this is a bug with the |
@simonbasle I tried reproducing the bug only on the pool side here and only on the I am not very familiar with the inner workings of either |
@simonbasle I have opened a |
@simonbasle After investigating the reactor-pool implementation more, should |
It looks like activating background eviction makes the test fail less often, but the test for the bug still fails sometimes. Edit: background eviction seems useless. |
I managed to isolate the reactor-pool bug here. Not even background eviction is helpful. |
Hi there! According to @mp911de and @chemicL this issue is the same as r2dbc/r2dbc-pool#198. That issue is another reproducer of this critical bug. Maybe this helps to spot the bug and fix it. |
@andreisilviudragnea thanks for the report. I'm sorry to see it didn't get much attention. With some other reports around r2dbc, jOOQ, and reactor-core I stumbled upon the same problem and discovered your report. So I'll use this issue to try to make some progress on resolving this. Here's a highly reproducible example that I created to capture what I observed in the r2dbc-pool issue mentioned by @agorbachenko (best to run it in a loop / repeat mode in the IDE): https://github.com/reactor/reactor-pool/compare/fix-124-deliver-cancel-race?expand=1. As a result of the race between
I'll chat with the team about possible solutions. |
@andreisilviudragnea I have some news and comments. I do believe it is time to close this particular issue. Here are the details:
import java.time.Duration;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;
import java.util.function.Function;
import org.junit.jupiter.api.Test;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Hooks;
import reactor.core.publisher.Mono;
import reactor.core.publisher.Sinks;
import static org.assertj.core.api.Assertions.assertThat;
import static org.awaitility.Awaitility.await;
public class ReactorPoolIsolatedBugTest {
private static final int COUNT = 10_000;
// <1> Added a sink to ensure all releases are consumed in time.
private Sinks.Many<Mono<Void>> asyncReleaseSink =
Sinks.many().unicast().onBackpressureBuffer();
private final InstrumentedPool<String> stringReactivePool = PoolBuilder
.from(Mono.just("value").delayElement(Duration.ofMillis(2)))
.maxPendingAcquireUnbounded()
.sizeBetween(0, 3)
.buildPool();
@Test
public void reactorPoolBug() throws InterruptedException {
// <2> This actually never gets triggered. Will explain soon.
Hooks.onNextDropped((Consumer<Object>) dropped -> {
if (dropped instanceof PooledRef) {
System.out.println("GOT DROPPED REF");
Sinks.EmitResult result =
asyncReleaseSink.tryEmitNext(((PooledRef<?>) dropped).release());
if (result.isFailure()) {
System.out.println("Failed to release dropped con");
}
}
});
ExecutorService executorService = Executors.newFixedThreadPool(
16,
r -> {
Thread t = Executors.defaultThreadFactory().newThread(r);
t.setDaemon(true);
return t;
}
);
// <1> Make sure all cancellations return connection to the pool before
// validation happens.
ExecutorService asyncReleasePool = Executors.newSingleThreadExecutor();
CountDownLatch asyncReleaseDone = new CountDownLatch(1);
asyncReleasePool.submit(
() -> asyncReleaseSink.asFlux()
.concatMap(Function.identity())
.subscribe()
);
CountDownLatch cdl = new CountDownLatch(COUNT);
for (int i = 0; i < COUNT; i++) {
executorService.submit(new FlatMapErrorTask(cdl));
}
cdl.await();
System.out.println("All tasks finished. Waiting for connection release.");
// <1> Follow up all async releases with a latch release to validate the state.
Sinks.EmitResult result =
asyncReleaseSink.tryEmitNext(Mono.fromRunnable(asyncReleaseDone::countDown));
assertThat(result).matches(Sinks.EmitResult::isSuccess);
asyncReleaseDone.await();
// <3> In case there was any asynchronous eviction in place
await().alias("acquiredSize").atMost(10, TimeUnit.SECONDS)
.untilAsserted(() -> assertThat(stringReactivePool.metrics().acquiredSize()).isEqualTo(0));
await().alias("idleSize").atMost(10, TimeUnit.SECONDS)
.untilAsserted(() -> assertThat(stringReactivePool.metrics().idleSize()).isEqualTo(3));
await().alias("allocatedSize").atMost(10, TimeUnit.SECONDS)
.untilAsserted(() -> assertThat(stringReactivePool.metrics().allocatedSize()).isEqualTo(3));
}
private final class FlatMapErrorTask implements Runnable {
private final CountDownLatch cdl;
public FlatMapErrorTask(CountDownLatch cdl) {
this.cdl = cdl;
}
public void run() {
Flux<Void> flux = Flux
.range(0, 10)
.flatMap(i -> stringReactivePool
.acquire()
.delayElement(Duration.ofMillis(100))
// <2> It can happen that the flatMap over the range is
// cancelled. However, the lambda in the flatMap that follows
// is not actually exercised. The value is discarded
// after cancellation, without a chance to be released. So
// we use the discard hook:
.doOnDiscard(PooledRef.class, s -> {
System.out.println("Discarded after acquire, pushing release to sink");
Sinks.EmitResult result =
asyncReleaseSink.tryEmitNext(s.release());
if (result.isFailure()) {
System.out.println("Failed to emit async release");
System.exit(1);
}
})
.flatMap(pooledRef -> Mono
.just(pooledRef.poolable())
.delayElement(Duration.ofMillis(10))
// <2> Might never be triggered.
.then(pooledRef.release())
// <2> Might never be triggered.
.onErrorResume(error -> pooledRef.release())
// <2> Might never be triggered.
.doOnCancel(() -> {
System.out.println("Canceled inner, pushing " +
"release to sink");
Sinks.EmitResult result =
asyncReleaseSink.tryEmitNext(pooledRef.release());
if (result.isFailure()) {
System.out.println("Failed to emit async release");
System.exit(1);
}
})
)
.switchIfEmpty(Mono.error(new RuntimeException("Empty")))
)
.doOnComplete(cdl::countDown)
.doOnError(error -> cdl.countDown());
try {
flux.blockLast();
} catch (Exception e) {
System.err.println(e);
}
try {
Thread.sleep(50);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
} @andreisilviudragnea With these explanations, my understanding is that the pool has no bug. However, the usage of various operators requires care when dealing with the connections to properly handle cancellations, discards, drops, and termination in a graceful manner at respective levels of chaining. I look forward to your verification of the above evaluation. Thanks in advance! |
@andreisilviudragnea friendly ping :) |
Ok, I'll close the issue. Please feel free to re-open if you disagree with my assessment. |
There are times when the resources are never released back to the pool. I do not know the cause yet, but I was able to reproduce the problem in a separate project here.
Expected Behavior
The resources should be released to the pool.
Actual Behavior
The resources never get released to the pool.
Steps to Reproduce
Possible Solution
The test should always finish, but most of the time it never does, because all the pool resources remain acquired. For an unknown reason, the test fails way less often if you remove the
Thread.sleep()
at the end.Your Environment
reactor-core:3.4.4
,reactor-pool:0.2.3
netty
, ...):java -version
):uname -a
):Linux andrei-desktop 5.8.0-44-generic #50-Ubuntu SMP Tue Feb 9 06:29:41 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: