CAMEL-19295: Basic thread-safe LRU cache #10157

davsclaus · 2023-05-19T09:13:55Z

…se will OOME

Description

Target

I checked that the commit is targeting the correct branch (note that Camel 3 uses camel-3.x, whereas Camel 4 uses the main branch)

Tracking

If this is a large change, bug fix, or code improvement, I checked there is a JIRA issue filed for the change (usually before you start working on it).

Apache Camel coding standards and style

I checked that each commit in the pull request has a meaningful subject line and body.

I formatted the code using mvn -Pformat,fastinstall install && mvn -Psourcecheck

…se will OOME

github-actions · 2023-05-19T09:14:08Z

🌟 Thank you for your contribution to the Apache Camel project! 🌟

🐫 Maintainers, please note that first-time contributors require manual approval for the GitHub Actions to run.

⚠️ Please note that the changes on this PR may be tested automatically if they change components.

If necessary Apache Camel Committers may access logs and test results in the job summaries!

github-actions · 2023-05-19T10:17:51Z

🚫 There are (likely) no components to be tested in this PR

orpiske · 2023-05-23T11:25:15Z

@davsclaus I've just got the results of testing this code. It has quite a big impact in concurrent scenarios. I run some tests comparing with Camel 4.0.0-M3 (baseline) with this patch.

For resolving the endpoints:

Baseline:

Benchmark                                Mode  Cnt  Score   Error  Units
EndpointResolveTest.testActionStatus_1   avgt   10  3.273 ± 0.003  us/op
EndpointResolveTest.testActionStatus_16  avgt   10  3.674 ± 0.007  us/op
EndpointResolveTest.testActionStatus_2   avgt   10  3.314 ± 0.002  us/op
EndpointResolveTest.testActionStatus_32  avgt   10  4.738 ± 0.020  us/op
EndpointResolveTest.testActionStatus_4   avgt   10  3.730 ± 0.001  us/op
EndpointResolveTest.testActionStatus_8   avgt   10  3.400 ± 0.002  us/op

This patch:

Benchmark                                Mode  Cnt   Score   Error  Units
EndpointResolveTest.testActionStatus_1   avgt   10   3.727 ± 0.002  us/op
EndpointResolveTest.testActionStatus_16  avgt   10  12.688 ± 1.002  us/op
EndpointResolveTest.testActionStatus_2   avgt   10   3.405 ± 0.003  us/op
EndpointResolveTest.testActionStatus_32  avgt   10  25.113 ± 0.785  us/op
EndpointResolveTest.testActionStatus_4   avgt   10   3.792 ± 0.001  us/op
EndpointResolveTest.testActionStatus_8   avgt   10   3.741 ± 0.014  us/op

Note: the _X is the number of threads (i.e.; EndpointResolveTest.testActionStatus_4 is the test with 4 threads).

For the endpoint registry operations:

Baseline:

Benchmark                                          Mode  Cnt     Score    Error  Units
EndpointRegistryTest.testContainsExistentKey       avgt   10    17.696 ±  0.007  us/op
EndpointRegistryTest.testContainsNonExistentKey    avgt   10     5.190 ±  0.001  us/op
EndpointRegistryTest.testContainsNonExistentValue  avgt   10  1866.783 ±  0.401  us/op
EndpointRegistryTest.testDynamicSize               avgt   10     0.002 ±  0.001  us/op
EndpointRegistryTest.testIsDynamic                 avgt   10  1045.090 ±  1.387  us/op
EndpointRegistryTest.testReadOnlyMap               avgt   10    18.633 ±  0.017  us/op
EndpointRegistryTest.testReadOnlyValues            avgt   10     5.812 ±  0.025  us/op

This patch:

Benchmark                                          Mode  Cnt     Score    Error  Units
EndpointRegistryTest.testContainsExistentKey       avgt   10    23.900 ±  0.006  us/op
EndpointRegistryTest.testContainsNonExistentKey    avgt   10    21.621 ±  0.032  us/op
EndpointRegistryTest.testContainsNonExistentValue  avgt   10  1866.985 ±  0.566  us/op
EndpointRegistryTest.testDynamicSize               avgt   10     0.027 ±  0.001  us/op
EndpointRegistryTest.testIsDynamic                 avgt   10  1003.702 ± 61.613  us/op
EndpointRegistryTest.testReadOnlyMap               avgt   10    19.331 ±  0.009  us/op
EndpointRegistryTest.testReadOnlyValues            avgt   10     5.895 ±  0.024  us/op

And the impact is quite extreme for the registry under concurrent access:

Baseline:

Benchmark                                                       Mode  Cnt     Score    Error  Units
EndpointRegistryScalabilityTest.testContainsNonExistentKey_2    avgt   10    17.520 ±  0.010  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentKey_4    avgt   10    16.919 ±  0.002  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentKey_8    avgt   10    16.848 ±  0.009  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_2  avgt   10  1881.827 ±  0.240  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_4  avgt   10  1873.975 ±  0.102  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_8  avgt   10  1904.569 ±  0.129  us/op
EndpointRegistryScalabilityTest.testDynamicSize_2               avgt   10     0.002 ±  0.001  us/op
EndpointRegistryScalabilityTest.testDynamicSize_4               avgt   10     0.002 ±  0.001  us/op
EndpointRegistryScalabilityTest.testDynamicSize_8               avgt   10     0.002 ±  0.001  us/op
EndpointRegistryScalabilityTest.testIsDynamic_2                 avgt   10  1005.876 ±  0.376  us/op
EndpointRegistryScalabilityTest.testIsDynamic_4                 avgt   10   984.097 ±  0.402  us/op
EndpointRegistryScalabilityTest.testIsDynamic_8                 avgt   10  1000.953 ±  0.392  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_2               avgt   10    20.184 ±  0.020  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_4               avgt   10    19.486 ±  0.010  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_8               avgt   10    19.062 ±  0.028  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_2            avgt   10     6.539 ±  0.012  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_4            avgt   10     6.498 ±  0.004  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_8            avgt   10     6.782 ±  0.018  us/op

This patch:

Benchmark                                                       Mode  Cnt      Score     Error  Units
EndpointRegistryScalabilityTest.testContainsNonExistentKey_2    avgt   10    138.018 ±   5.506  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentKey_4    avgt   10    347.650 ±   4.114  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentKey_8    avgt   10   2727.941 ±  27.952  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_2  avgt   10   4908.715 ± 434.815  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_4  avgt   10   9054.876 ±  61.169  us/op
EndpointRegistryScalabilityTest.testContainsNonExistentValue_8  avgt   10  18575.008 ± 321.350  us/op
EndpointRegistryScalabilityTest.testDynamicSize_2               avgt   10      0.121 ±   0.001  us/op
EndpointRegistryScalabilityTest.testDynamicSize_4               avgt   10      0.576 ±   0.002  us/op
EndpointRegistryScalabilityTest.testDynamicSize_8               avgt   10      1.915 ±   0.008  us/op
EndpointRegistryScalabilityTest.testIsDynamic_2                 avgt   10   1050.203 ±   2.756  us/op
EndpointRegistryScalabilityTest.testIsDynamic_4                 avgt   10   1093.550 ±  60.650  us/op
EndpointRegistryScalabilityTest.testIsDynamic_8                 avgt   10   3698.012 ±  16.992  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_2               avgt   10     20.164 ±   0.038  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_4               avgt   10     19.920 ±   0.008  us/op
EndpointRegistryScalabilityTest.testReadOnlyMap_8               avgt   10     19.935 ±   0.025  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_2            avgt   10      6.683 ±   0.028  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_4            avgt   10      6.528 ±   0.013  us/op
EndpointRegistryScalabilityTest.testReadOnlyValues_8            avgt   10      7.064 ±   0.026  us/op

davsclaus · 2023-05-23T11:34:50Z

Yeah can you test with camel-caffeine-lrucache as well.

Also I think we can have a non LRU cache, we often just want to keep 1000 elements in a cache, and they may not need to be exact LRU based. So they can be FIFO or anything like that, just that the size of big enough for normal use-cases.

So basically we can make a cache that is just a ConcurrentMap from the JDK to be used in the various places, where a LRU is really not needed.

orpiske · 2023-05-23T11:44:30Z

Sure thing. Let me run the tests with camel-caffeine-lrucache and see how it goes. I believe we can have the results today still.

Also I think we can have a non LRU cache, we often just want to keep 1000 elements in a cache, and they may not need to be exact LRU based. So they can be FIFO or anything like that, just that the size of big enough for normal use-cases.

Indeed. Maybe, it's the case we could go with a simpler alternative by default and ... if needed, we can leave it flexible so users can plug a more extensible one if they need to.

I was thinking maybe, we could have a ring buffer (circular queue). IMHO, we may not necessarily need to expire the least-recently used records, but we could overwrite them if needed be.

essobedo · 2023-05-23T11:47:03Z

@davsclaus if it helps I have an idea of how to implement a basic thread-safe LRU cache. Let me know if you are interested.

davsclaus · 2023-05-23T12:12:44Z

@essobedo yeah sure, we may only need LRU in some situations

dynamic endpoint registry (LRU could be avoided and just have a reasonable big size)
simple cache (LRU not needed, cache size can be reasonable)
jmx mbean cache (LRU not needed, only in use during startup) - we can have a bigger size, and clear the cache after startup
service pool (only for non singleton = only a few components: ftp, ssh, mina, avro-rpc) and as this is only 4 components then LRU is not needed
bean introspection (LRU likely not as needed if we have a big enough size - LRU has overhead to reorder on get)
file idempotent repository (is LRU based today - can be synchronized as it does not need super fast (read/write to disk anyway)
MemoryIdempotentRepository can be synchronized as it does not need super fast (memory only for development)

orpiske · 2023-05-23T12:21:02Z

@essobedo if you come up with the LRU Cache relatively soon - no pressure - I also offer to run the same tests on the perf lab I have access. I should have access to the machines for a few more weeks, so it should be fine.

davsclaus · 2023-05-23T12:22:08Z

@essobedo you are welcome to provide a LRUCache then @orpiske can give it a test in his lab.
After that we can look whether we can refine and avoid LRU where its not really needed anyway - a ConcurrentMap from JDK is the fastest.

essobedo · 2023-05-23T12:44:25Z

Ok I try to provide something today

essobedo · 2023-05-23T13:27:27Z

@davsclaus May I use the same branch? Or should I create another one?

davsclaus · 2023-05-23T13:37:35Z

yes sure you are welcome to use this branch or whatever you think is best

rhuan080 · 2023-05-23T13:40:28Z

Hi, I have created the CAMEL-19311 to discuss the LRU. I'm adding the issue here just to link this issue to the JIRA issue opened by me.

rhuan080 · 2023-05-23T13:57:39Z

core/camel-support/src/main/java/org/apache/camel/support/DefaultLRUCacheFactory.java

@@ -53,7 +54,7 @@ public <K, V> Map<K, V> createLRUCache(int maximumCacheSize) {
    @Override
    public <K, V> Map<K, V> createLRUCache(int maximumCacheSize, Consumer<V> onEvict) {
        LOG.trace("Creating LRUCache with maximumCacheSize: {}", maximumCacheSize);
-        return new SimpleLRUCache<>(16, maximumCacheSize, onEvict);
+        return Collections.synchronizedMap(new SimpleLRUCache<>(16, maximumCacheSize, onEvict));


I think it can impact code casting to SimpleLRUCache to use some SimpleLRUCache's methods.

camel/core/camel-support/src/main/java/org/apache/camel/support/DefaultLRUCacheFactory.java

Line 168 in 277887d

protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {

essobedo · 2023-05-23T20:23:43Z

@orpiske could you please try this?

github-actions · 2023-05-23T21:06:52Z

🚫 There are (likely) no components to be tested in this PR

ben-manes · 2023-05-24T00:06:51Z

Second Chance (Clock) is very easy to implement to give LRU like hit rates, high read concurrency, and is a good fit for small custom caches. It is basically a FIFO with a mark bit that is set on read. On a write a global lock is acquired, the FIFO is scanned resetting the mark bit, and the first unset entry is chosen for eviction. This means the worst case is O(n), which is fine for a cache of a few thousand entries. As reads are the common case, writes that block on this lock to perform a small amount of work is acceptable.

When brainstorming approaches for ConcurrentLinkedHashMap (Guava/Caffeine predecessor), that was my original approach that triggered my interest when solving some work performance problems. I had to release that as a pre-1.0 beta to mitigate people from using my unstable concurrent lru alpha code, so you can see review that as a reference and a basic analysis.

davsclaus · 2023-05-24T05:04:10Z

Thanks @ben-manes for sharing your knowledge to the Camel community.

The code for the Caffeine LRU Cache is here
https://github.com/apache/camel/tree/camel-3.x/components/camel-caffeine-lrucache/src/main/java/org/apache/camel/component/caffeine/lrucache

It may be that we can improve this and our situation with Camel is the cache sizes are small (1000) (never huge)

And today maybe the warmup is not as needed (we run a lot of unit tests and start/stop Camel frequently, so fast startup in these test is preferred)
https://github.com/apache/camel/blob/camel-3.x/components/camel-caffeine-lrucache/src/main/java/org/apache/camel/component/caffeine/lrucache/CaffeineLRUCacheFactory.java#L53

ben-manes · 2023-05-24T05:08:52Z

Oh that’s great. I simply meant that if you want to make this existing lru cache concurrent without a dependency, then second chance is an effective approach that should be easy to maintain. Your pr changes also look reasonable.

ben-manes · 2023-05-24T05:26:29Z

For warmup, a very long time ago the builder had a class loading issue because it selected the implementation by a string switch. That seemed to cause the factory to be slow to classload due to the constant pool. I don’t think it would have tried to load the implementations, but it was oddly visible. The cache uses codegen variants to minimize the memory footprint, e.g. have ttl entry fields only if expirable.

That was fixed by using reflection which also trimmed the jar bloat. The current version can instantiate 180k/s. This has done in 2.6.1 (2017).

An unreleased optimization cached that reflective load so it becomes a direct call on subsequent usages. That resulted in 4M/s. Happy to release it if the current time is still an issue and the snapshot jar shows an improvement.

orpiske · 2023-05-24T07:51:45Z

@orpiske could you please try this?

I'll try this one today. I think I can post the results by COB.

oscerd · 2023-05-24T07:56:11Z

Thanks a lot @ben-manes for sharing your knowledge.

essobedo · 2023-05-24T08:18:28Z

@orpiske FYI, I've changed a bit the implementation for something easier to maintain and closer in terms of behavior to the initial non thread-safe implementation

orpiske · 2023-05-24T08:34:56Z

@orpiske FYI, I've changed a bit the implementation for something easier to maintain and closer in terms of behavior to the initial non thread-safe implementation

Thanks for the heads up. That's OK, the test should start in about 1 hour or so, so it should pick the updated one.

github-actions · 2023-05-24T09:09:43Z