Explore TinyLFU cache #16802

ben-manes · 2016-02-25T06:13:50Z

When removing Guava as a dependency, @jasontedor wrote an 256-way LRU cache using a Guava-like API. The design uses a read/write lock per table segment and a lock around the global linked for LRU ordering.

There are a few concerns that might be worth addressing. First is that a r/w lock is expensive, often much more than an exclusive lock, and its usage per read is probably best replaced with a ConcurrentHashMap. Second is that the LRU lock is a throttling point, as all accesses contend on it. Third is that computeIfAbsent is racy by allowing redundant computations with last insert wins. Fourth is that LRU is sub-optimal for search workloads, where frequency is a more insightful metric.

The easiest (and biased) solution is to adopt Caffeine. Alternatively a big win would come from integrating TinyLFU into the existing cache. This would increase the hit rate, thereby reducing I/O and GC allocations, resulting in improved latencies.

The text was updated successfully, but these errors were encountered:

jasontedor · 2016-02-25T17:11:51Z

@ben-manes Thanks for the post.

First is that a r/w lock is expensive, often much more than an exclusive lock, and its usage per read is probably best replaced with a ConcurrentHashMap. Second is that the LRU lock is a throttling point, as all accesses contend on it.

It's difficult to say how impactful these are without search latency benchmarks here (to be clear since I don't want this to read like a dodge, I suspect the LRU lock is impactful under heavy search loads with large search thread pools). Unfortunately, we do not have benchmarks here that would stress the lock contention issues. These sorts of benchmarks are a work-in-progress.

Third is that computeIfAbsent is racy by allowing redundant computations with last insert wins.

This was the case for the initial commit that you linked to, but this has been addressed and (unless I'm mistaken) is no longer the case.

Fourth is that LRU is sub-optimal for search workloads, where frequency is a more insightful metric.

Do you have any literature on this specific point that you can share?

The easiest (and biased) solution is to adopt Caffeine. Alternatively a big win would come from integrating TinyLFU into the existing cache.

I'll take a look. I have some ideas regarding the LRU lock contention as well, but I really don't want to go stabbing in the dark until we can get valid multi-threaded search latency benchmarks in place. One goal we had with the cache implementation was to keep it simple.

ben-manes · 2016-02-25T18:07:54Z

It's difficult to say how impactful these are without search latency benchmarks here

Yep, I'd much rather have data to work from than a hypothesis too. I know from micro-benchmarks and past experiences that it can be a problem, but it is too application dependent to draw a conclusion. The hit rate generally has a bigger impact than concurrency, given how expensive the miss penalty often is. I'm more interested in seeing if we can improve that for ES.

This was the case for the initial commit that you linked to, but this has been addressed

Oh, nice. Sorry that I missed that when skimming over the code.

Do you have any literature on this specific point that you can share?

Zipf-like distributions are commonly discussed in information retrieval literature (1, 2, 3, 4, 5). Similarly the distribution holds as a model for caching (6, 7, 8). This is fairly intuitive because the popularity of items is a good signal, but some aging based on recency is logically necessary to avoid holding stale data for too long. The TinyLFU paper includes charts with web search traces from IBM and UMass. This policy uses recency and historic frequency (present and absent entries) to predict whether an item should be retained.

I have some ideas regarding the LRU lock contention as well

This article on HighScalability describes the approach that I took. The concurrency model was proven out in my prior libraries (CLHM, Guava). The former is used by Cassandra (among others) and the latter is of course popular but underperforms due to important optimizations not being ported over. This has very predictable latencies by being O(1) and borrows from database / filesystem theory where similar approaches are used to scale writes.

The results show that their cache performs very poorly under concurrent workloads. The striped LRU matches the theoretical, but search is far more biased by frequency. For the results see elastic/elasticsearch#16802

ben-manes · 2016-04-29T04:28:18Z

@jasontedor
I added ElasticSearch's cache to Caffeine's benchmark suite. The results were not positive.

The micro-benchmarks were run on my laptop. These results are slower than a synchronized LinkedHashMap in LRU mode. I ran Caffeine a few times since this isn't a good (isolated) testbed machine and it consumes all the available CPU resources.

Benchmark	ElasticSearch	LinkedHashMap	Caffeine
read-only	2.5 M/s	7.4 M/s	130 - 150 M/s
read-write	2.3 M/s	7.7 M/s	100 - 115 M/s
write-only	1.6 M/s	7.7 M/s	50 - 65 M/s

The simulator compares the hit rate of the caching policies. The current cache stays very close to a pure LRU, which validates the design. Unfortunately LRU is poor for search workloads, which are biased towards frequency instead of recency. The efficiency using the Univ. of Massachusetts WebSearch1 workload was,

Benchmark	ElasticSearch	Caffeine	Optimal
WS1 @ 2M	6.09 %	23.06 %	41.28 %
WS1 @ 4M	21.60 %	41.32 %	57.80 %
WS1 @ 6M	45.74 %	55.02 %	65.85 %

It may very well be that the cache is not a critical bottleneck for ElasticSearch. Unfortunately it appears to leave a lot of performance on the table, which I bet a little profiling would dramatically improve. If the size of the cache is large, there is an opportunity it either achieve the same hit rate at a reduced capacity or increase the hit rate. In either case there should be reduced latencies (less GC / fewer misses).

ben-manes · 2016-10-31T02:11:01Z

Congrats on 5.0! Its an impressive release and very timely for me. I was about to start migrating from Algolia to Elastic, and was considering Druid for analytical functions. The new features in Elastic are a wonderful alternative, so I'm looking forward to using them instead.

elasticmachine · 2018-04-24T10:34:38Z

Pinging @elastic/es-core-infra

ben-manes · 2022-01-16T02:08:31Z

As ES8 is moving to Java 17, I'll have to remove the cache from my benchmark suite (since Caffeine is Java 11-based). Unfortunately the issues observed with its low hit rate in search workloads and poor throughput in concurrency benchmarks remains unfixed (and reported as a problem by users, e.g. #69646). I noticed that @original-brownbear recently did some nice work on the cache to reduce its memory overhead, but these problems are algorithmic. If later you would like to fix these issues then we can revive the integration for an analysis. Obviously switching (as Solr did) would be welcome.

jasontedor self-assigned this Feb 25, 2016

jasontedor mentioned this issue Feb 25, 2016

Search benchmarks are single-threaded and suffer from coordinated omission elastic/rally#64

Closed

clintongormley added discuss :Internal labels Feb 28, 2016

jbduncan mentioned this issue Jul 13, 2016

Create a new Caching section, and add Caffeine & move Ehcache to it. akullpp/awesome-java#382

Merged

ben-manes mentioned this issue Jul 17, 2016

Cached query/filter performance degradation from 1.7.3 to 2.3.2 #18680

Closed

clintongormley removed the :Internal label Feb 14, 2018

colings86 added the :Core/Infra/Core Core issues without another label label Apr 24, 2018

ben-manes closed this as completed Apr 24, 2018

ben-manes mentioned this issue Mar 2, 2021

org.elasticsearch.common.cache.Cache lock contention severely #69646

Closed

ben-manes mentioned this issue Mar 19, 2021

Possible JDK16 issue in 3.0.0: ThreadLocal not initialised? ben-manes/caffeine#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore TinyLFU cache #16802

Explore TinyLFU cache #16802

ben-manes commented Feb 25, 2016

jasontedor commented Feb 25, 2016

ben-manes commented Feb 25, 2016

ben-manes commented Apr 29, 2016 •

edited

Loading

ben-manes commented Oct 31, 2016

elasticmachine commented Apr 24, 2018

ben-manes commented Jan 16, 2022

Explore TinyLFU cache #16802

Explore TinyLFU cache #16802

Comments

ben-manes commented Feb 25, 2016

jasontedor commented Feb 25, 2016

ben-manes commented Feb 25, 2016

ben-manes commented Apr 29, 2016 • edited Loading

ben-manes commented Oct 31, 2016

elasticmachine commented Apr 24, 2018

ben-manes commented Jan 16, 2022

ben-manes commented Apr 29, 2016 •

edited

Loading