@CacheResult with method returning Uni makes cache exceed its maximum size #40852

jolsum · 2024-05-27T08:58:37Z

Describe the bug

When a method retuning a Uni is annotated with @CacheResult the result of the Uni is cached as expected, but the cache grows infinitely even when a maximum size is set.

Expected behavior

When the cache reaches the specified maximum size, the number of elements in the cache should stop growing, or at least not grow infinitely.

Actual behavior

No elements are evicted and the cache grows infinitely

How to Reproduce?

In GreetingResource there are two endpoints ("/hello/fast" and "/hello/delayed"). The "delayed" endpoint gets its data from a method returning a Uni<String> where the response is delayed, and the method is annotated with @CacheResult. The "fast" endpoint gets it data from method returning a Uni with an item without any delay.

@Path("/hello")
public class GreetingResource {

  @GET
  @Produces(MediaType.TEXT_PLAIN)
  @Path("/delayed")
  public Uni<String> delayed() {
    return getHelloDelayed(System.currentTimeMillis());
  }

  @GET
  @Produces(MediaType.TEXT_PLAIN)
  @Path("/fast")
  public Uni<String> fast() {
    return getHelloFast(System.currentTimeMillis());
  }

  @CacheResult(cacheName = "delayed")
  Uni<String> getHelloDelayed(long key) {
    return Uni.createFrom().item("Hello with a delay").onItem().delayIt().by(Duration.ofMillis(10));
  }

  @CacheResult(cacheName = "fast")
  Uni<String> getHelloFast(long key) {
    return Uni.createFrom().item("Hello");
  }
}

The cache maximum size is defined as quarkus.cache.caffeine.maximum-size = 10

In the GreetingResourceTest the endpoints are invoked 20 times and then wait for the size of the cache to become 10.

@QuarkusTest
class GreetingResourceTest {

  @Inject
  @CacheName("fast")
  Cache cacheFast;

  @Inject
  @CacheName("delayed")
  Cache cacheDelayed;

  @Test
  void testFastEndpoint() {
    for (int i = 0; i < 20; i++) {
      given().when().get("/hello/fast").then().statusCode(200).body(is("Hello"));
    }
    await()
        .atMost(5, TimeUnit.SECONDS)
        .until(() -> ((CaffeineCache) cacheFast).keySet().size() == 10);
  }

  @Test
  void testDelayedEndpoint() {
    for (int i = 0; i < 20; i++) {
      given().when().get("/hello/delayed").then().statusCode(200).body(is("Hello with a delay"));
    }
    await()
        .atMost(5, TimeUnit.SECONDS)
        .until(() -> ((CaffeineCache) cacheDelayed).keySet().size() == 10);
  }
}

However, the size remains at 20 even after waiting for 5 seconds

Output of `uname -a` or `ver`

Microsoft Windows [Version 10.0.22631.3593]

Output of `java -version`

openjdk version "21.0.1" 2023-10-17

Quarkus version or git rev

3.10.2

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

Apache Maven 3.9.6

Additional information

I have done some debugging and manage to narrow the issue down to being an issue with CaffeineCache and how it works with the CompletableFuture returned from Uni.subscribeAsCompletionStage().

If I create a Uni with a delayed item like this

    var future =
        Uni.createFrom()
            .item("item")
            .onItem()
            .delayIt()
            .by(Duration.ofMillis(100))
            .subscribeAsCompletionStage();

    future.whenComplete(
        (result, error) -> {
          System.out.println(future.isDone());
        });

In whenComplete() the future.isDone() returns false. So in Caffeine class BoundedLocalCache.replace() the item gets weight 0 and the cache considers it as a read rather than a write even if the item is stored, and the eviction is not run.

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2024-05-27T08:58:41Z

/cc @gwenneg (cache)

geoand · 2024-06-06T16:54:44Z

Thanks a lot for digging into this.

This part seems very interesting to me:

    var future =
        Uni.createFrom()
            .item("item")
            .onItem()
            .delayIt()
            .by(Duration.ofMillis(100))
            .subscribeAsCompletionStage();

    future.whenComplete(
        (result, error) -> {
          System.out.println(future.isDone());
        });

I indeed ran it and it prints false. @jponge is that expected? If it is, I'd like to know why :)

jponge · 2024-06-06T19:53:48Z

Funny, I ran:

@Test
void yolo() throws Throwable {

    var future =
            Uni.createFrom()
                    .item("item")
                    .onItem()
                    .delayIt()
                    .by(Duration.ofMillis(100))
                    .subscribeAsCompletionStage();

    future.whenComplete(
            (result, error) -> {
                System.out.println(future.isDone());
            });

    future.get(1, TimeUnit.SECONDS);
}

and got true.

jponge · 2024-06-06T19:56:18Z

Also:

    @Test
    void yolo() throws Throwable {
        var future =
                Uni.createFrom()
                        .item("item")
                        .onItem()
                        .delayIt()
                        .by(Duration.ofMillis(100))
                        .subscribeAsCompletionStage();

        AtomicBoolean done = new AtomicBoolean();
        future.whenComplete(
                (result, error) -> {
                    done.set(future.isDone());
                });

        assertThat(future.get(1, TimeUnit.SECONDS)).isEqualTo("item");
        assertThat(done).isTrue();
    }

jponge · 2024-06-06T19:58:23Z

I'm not familiar with that code base, but there might be a timing issue causing the future completion state to be looked from code that doesn't assume async semantics, hence it reads false.

geoand · 2024-06-07T05:16:02Z

I tried it various ways and I am always getting false...

ben-manes · 2024-06-07T06:01:05Z

Could it be something to do with your CompletableFutureWrapper? I see that set in Infrastructure.setCompletableFutureWrapper for only some environments, so may explain inconsistent reproducer. I'm unsure how to write this up to check.

geoand · 2024-06-07T06:05:43Z

That's a very good candidate, and would explain the discrepancy we are seeing:

I am running the tests in a Quarkus project (which includes context-propagation that calls the method you mention) while I assume @jponge is running his test in upstream Mutiny.

ben-manes · 2024-06-07T06:09:27Z

Confirmed

  public static void main(String[] args) {
    var future = Uni
        .createFrom()
        .item("item")
        .onItem()
        .delayIt()
        .by(Duration.ofMillis(100))
        .subscribeAsCompletionStage();

    future.whenComplete((result, error) -> {
      System.out.println("Direct: " + future.isDone());
    }).join();

    Infrastructure.setCompletableFutureWrapper(new UnaryOperator<CompletableFuture<?>>() {
      @Override public CompletableFuture<?> apply(CompletableFuture<?> t) {
        var threadContext = SmallRyeThreadContext.getCurrentThreadContextOrDefaultContexts();
        return threadContext.withContextCapture(t);
      }
    });
    var futureW = Uni
        .createFrom()
        .item("item")
        .onItem()
        .delayIt()
        .by(Duration.ofMillis(100))
        .subscribeAsCompletionStage();

    futureW.whenComplete((result, error) -> {
      System.out.println("Wrapped: " + futureW.isDone());
    }).join();
  }

Direct: true
Wrapped: false

geoand · 2024-06-07T06:14:15Z

Great catch!

Right, I ran the same thing and saw the same results.
Now we need to find why context propagation is causing this issue...

geoand · 2024-06-07T06:15:00Z

@FroMage will be interested in this

geoand · 2024-06-07T06:24:44Z

Not that this adds anything, but I'll just mention that there is certainly a timing issue at play as although the delegate CF has been completed, the CompletableFutureWrapper doesn't get completed until later and therefore the call to done returns false

ben-manes · 2024-06-07T06:57:55Z

I believe it is because dependent actions are on a stack they run in LIFO order, though most assume FIFO. Therefore the constructor’s whenComplete to set the wrapper’s result is run last. Since client actions are attached directly to the delegate, the wrapper is incomplete. A possible solution is to use the f.whenComplete’s future as your delegate.

geoand · 2024-06-07T06:58:46Z

I don't know enough about how Context Propagation and Mutiny interact, but given that the completion takes time to propagate, the obvious solution would be for isDone to always call the delegate. I don't know however what side effects that could cause.

geoand · 2024-06-07T07:03:51Z

Given all the info, I think this one is for @jponge and @FroMage to figure out 😉

ben-manes · 2024-06-07T07:04:59Z

I think my solution is safer to maintain the correct ordering. Can you try testing with this modification?

this.f = f.whenComplete(…);

geoand · 2024-06-07T07:05:50Z

I think my solution is safer to maintain the correct ordering

Absolutely!

geoand · 2024-06-07T07:06:37Z

Can you try testing with this modification?

Let me try and build all the necessary pieces :)

geoand · 2024-06-07T07:09:30Z

The tests of Context Propagation fail with that change, but that might not be the fault of the change itself. Anyway, I'll build with those tests skipped

ben-manes · 2024-06-07T07:10:58Z

fwiw, the equals contract on this wrapper is concerning since it may not be symmetric. A future use identity equality so I think it’s broken now and with my suggestion. I think deleting the equals/hashCode might also be advisable. Just eye balling so might be wrong

geoand · 2024-06-07T07:13:33Z

I tried your suggestion and it does work on the basic test we have been discussing.

geoand · 2024-06-07T07:14:25Z

I think @FroMage has a lot of information here he can use to fix this in Context Propagation :)

jponge · 2024-06-07T07:24:12Z

I'm indeed running upstream Mutiny, thanks for finding the issue in context propagation.

geoand · 2024-06-07T07:27:57Z

The tests of Context Propagation fail with that change, but that might not be the fault of the change itself. Anyway, I'll build with those tests skipped

Looking at it a little more, those tests should be easily fixable but doing some additional exception unwrapping

FroMage · 2024-06-07T09:47:17Z

I dunno, it sounds like you are on top of it, do you really want me to take over? :)

geoand · 2024-06-07T09:51:23Z

I think that would be best since we could have missed unintended side effects in CP

FroMage · 2024-06-07T10:03:07Z

OK

FroMage · 2024-06-07T11:08:00Z

OK, please review smallrye/smallrye-context-propagation#445

Let's be honest, I don't remember the difference between wrappers and dependent stages, so I've no idea why fixing this breaks the TCK. This fix feels logical, in that it's conditional based on wrapper/dependent, though. Appears to fix the issue at hand (if my test is correct, and it was failing indeed before I added the fix) and the TCK keeps running.

Please review and let me know WYT and then I can make a release.

FroMage · 2024-06-07T14:51:16Z

So I made a 2.1.1 release, and I did find a test not involving mutiny that exhibited this behaviour.

But I can't reproduce this test (#40852 (comment)) in integration-tests/smallrye-context-propagation so I'm a bit annoyed, it works with 2.1.0 and 2.1.1.

I'll make a PR anyway.

Fixes quarkusio#40852

jolsum · 2024-06-09T19:49:16Z

I can confirm that the fix in Context Propagation version 2.1.1 resolves this issue. Thanks, @FroMage, @geoand, @jponge and @ben-manes!

Fixes quarkusio#40852

Fixes quarkusio#40852 (cherry picked from commit 0dc312b)

Fixes quarkusio#40852

Fixes quarkusio#40852 (cherry picked from commit 0dc312b)

israelstmz · 2024-09-18T08:35:48Z

Hay, we are running into what seems like a very similar issue in our production. Using 3.14.4. We are using the @CacheResult with a method returning Uni and we see that the cache greatly exceeds the limit. our limit is 10K and it reaches way above 100K. The thing is that out of ~500 application instances it happens only on 2-3... Also we can't reproduce it using the above codes...
@geoand, any idea how can we approach this? it causes a memory leak.

geoand · 2024-09-18T11:33:20Z

@israelstmz I would suggest opening a new issue adding as much detail as possible.
Without being able to reproduce the problem it will be very hard to find a solution, but it's definitely worth a try regardless.

israelstmz · 2024-09-19T07:32:52Z

opened: https://github.com/quarkusio/quarkus/issues/43382

Fixes quarkusio#40852

jolsum added the kind/bug Something isn't working label May 27, 2024

quarkus-bot bot added area/cache env/windows Impacts Windows machines labels May 27, 2024

geoand removed the env/windows Impacts Windows machines label May 28, 2024

FroMage mentioned this issue Jun 7, 2024

CompletionStageWrapper.isDone behaviour is wrong smallrye/smallrye-context-propagation#444

Closed

FroMage added a commit to FroMage/quarkus that referenced this issue Jun 7, 2024

Use latest SR/CP

104b838

Fixes quarkusio#40852

FroMage mentioned this issue Jun 7, 2024

Use latest SmallRye Context Propagation #41064

Merged

FroMage added a commit to FroMage/quarkus that referenced this issue Jun 11, 2024

Use latest SR/CP

0dc312b

Fixes quarkusio#40852

geoand closed this as completed in #41064 Jun 11, 2024

quarkus-bot bot added this to the 3.12 - main milestone Jun 11, 2024

gsmet modified the milestones: 3.12 - main, 3.11.2 Jun 11, 2024

gsmet pushed a commit to gsmet/quarkus that referenced this issue Jun 11, 2024

Use latest SR/CP

b37b2c6

Fixes quarkusio#40852 (cherry picked from commit 0dc312b)

holly-cummins pushed a commit to holly-cummins/quarkus that referenced this issue Jul 31, 2024

Use latest SR/CP

e9a9e78

Fixes quarkusio#40852

gsmet modified the milestones: 3.11.2, 3.8.6 Aug 14, 2024

gsmet pushed a commit to gsmet/quarkus that referenced this issue Aug 14, 2024

Use latest SR/CP

54f4805

Fixes quarkusio#40852 (cherry picked from commit 0dc312b)

danielsoro pushed a commit to danielsoro/quarkus that referenced this issue Sep 20, 2024

Use latest SR/CP

5c2def9

Fixes quarkusio#40852

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@CacheResult with method returning Uni makes cache exceed its maximum size #40852

@CacheResult with method returning Uni makes cache exceed its maximum size #40852

jolsum commented May 27, 2024 •

edited

Loading

quarkus-bot bot commented May 27, 2024

geoand commented Jun 6, 2024 •

edited

Loading

jponge commented Jun 6, 2024

jponge commented Jun 6, 2024 •

edited

Loading

jponge commented Jun 6, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024 •

edited

Loading

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024 •

edited

Loading

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

jponge commented Jun 7, 2024

geoand commented Jun 7, 2024

FroMage commented Jun 7, 2024

geoand commented Jun 7, 2024

FroMage commented Jun 7, 2024

FroMage commented Jun 7, 2024

FroMage commented Jun 7, 2024

jolsum commented Jun 9, 2024

israelstmz commented Sep 18, 2024

geoand commented Sep 18, 2024

israelstmz commented Sep 19, 2024

@CacheResult with method returning Uni makes cache exceed its maximum size #40852

@CacheResult with method returning Uni makes cache exceed its maximum size #40852

Comments

jolsum commented May 27, 2024 • edited Loading

Describe the bug

Expected behavior

Actual behavior

How to Reproduce?

Output of uname -a or ver

Output of java -version

Quarkus version or git rev

Build tool (ie. output of mvnw --version or gradlew --version)

Additional information

quarkus-bot bot commented May 27, 2024

geoand commented Jun 6, 2024 • edited Loading

jponge commented Jun 6, 2024

jponge commented Jun 6, 2024 • edited Loading

jponge commented Jun 6, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024 • edited Loading

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

ben-manes commented Jun 7, 2024 • edited Loading

geoand commented Jun 7, 2024

geoand commented Jun 7, 2024

jponge commented Jun 7, 2024

geoand commented Jun 7, 2024

FroMage commented Jun 7, 2024

geoand commented Jun 7, 2024

FroMage commented Jun 7, 2024

FroMage commented Jun 7, 2024

FroMage commented Jun 7, 2024

jolsum commented Jun 9, 2024

israelstmz commented Sep 18, 2024

geoand commented Sep 18, 2024

israelstmz commented Sep 19, 2024

jolsum commented May 27, 2024 •

edited

Loading

Output of `uname -a` or `ver`

Output of `java -version`

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

geoand commented Jun 6, 2024 •

edited

Loading

jponge commented Jun 6, 2024 •

edited

Loading

geoand commented Jun 7, 2024 •

edited

Loading

ben-manes commented Jun 7, 2024 •

edited

Loading