-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore TinyLFU cache #16802
Comments
@ben-manes Thanks for the post.
It's difficult to say how impactful these are without search latency benchmarks here (to be clear since I don't want this to read like a dodge, I suspect the LRU lock is impactful under heavy search loads with large search thread pools). Unfortunately, we do not have benchmarks here that would stress the lock contention issues. These sorts of benchmarks are a work-in-progress.
This was the case for the initial commit that you linked to, but this has been addressed and (unless I'm mistaken) is no longer the case.
Do you have any literature on this specific point that you can share?
I'll take a look. I have some ideas regarding the LRU lock contention as well, but I really don't want to go stabbing in the dark until we can get valid multi-threaded search latency benchmarks in place. One goal we had with the cache implementation was to keep it simple. |
Yep, I'd much rather have data to work from than a hypothesis too. I know from micro-benchmarks and past experiences that it can be a problem, but it is too application dependent to draw a conclusion. The hit rate generally has a bigger impact than concurrency, given how expensive the miss penalty often is. I'm more interested in seeing if we can improve that for ES.
Oh, nice. Sorry that I missed that when skimming over the code.
Zipf-like distributions are commonly discussed in information retrieval literature (1, 2, 3, 4, 5). Similarly the distribution holds as a model for caching (6, 7, 8). This is fairly intuitive because the popularity of items is a good signal, but some aging based on recency is logically necessary to avoid holding stale data for too long. The TinyLFU paper includes charts with web search traces from IBM and UMass. This policy uses recency and historic frequency (present and absent entries) to predict whether an item should be retained.
This article on HighScalability describes the approach that I took. The concurrency model was proven out in my prior libraries (CLHM, Guava). The former is used by Cassandra (among others) and the latter is of course popular but underperforms due to important optimizations not being ported over. This has very predictable latencies by being O(1) and borrows from database / filesystem theory where similar approaches are used to scale writes. |
The results show that their cache performs very poorly under concurrent workloads. The striped LRU matches the theoretical, but search is far more biased by frequency. For the results see elastic/elasticsearch#16802
@jasontedor The micro-benchmarks were run on my laptop. These results are slower than a synchronized LinkedHashMap in LRU mode. I ran Caffeine a few times since this isn't a good (isolated) testbed machine and it consumes all the available CPU resources.
The simulator compares the hit rate of the caching policies. The current cache stays very close to a pure LRU, which validates the design. Unfortunately LRU is poor for search workloads, which are biased towards frequency instead of recency. The efficiency using the Univ. of Massachusetts
It may very well be that the cache is not a critical bottleneck for ElasticSearch. Unfortunately it appears to leave a lot of performance on the table, which I bet a little profiling would dramatically improve. If the size of the cache is large, there is an opportunity it either achieve the same hit rate at a reduced capacity or increase the hit rate. In either case there should be reduced latencies (less GC / fewer misses). |
Congrats on 5.0! Its an impressive release and very timely for me. I was about to start migrating from Algolia to Elastic, and was considering Druid for analytical functions. The new features in Elastic are a wonderful alternative, so I'm looking forward to using them instead. |
Pinging @elastic/es-core-infra |
As ES8 is moving to Java 17, I'll have to remove the cache from my benchmark suite (since Caffeine is Java 11-based). Unfortunately the issues observed with its low hit rate in search workloads and poor throughput in concurrency benchmarks remains unfixed (and reported as a problem by users, e.g. #69646). I noticed that @original-brownbear recently did some nice work on the cache to reduce its memory overhead, but these problems are algorithmic. If later you would like to fix these issues then we can revive the integration for an analysis. Obviously switching (as Solr did) would be welcome. |
When removing Guava as a dependency, @jasontedor wrote an 256-way LRU cache using a Guava-like API. The design uses a read/write lock per table segment and a lock around the global linked for LRU ordering.
There are a few concerns that might be worth addressing. First is that a r/w lock is expensive, often much more than an exclusive lock, and its usage per read is probably best replaced with a ConcurrentHashMap. Second is that the LRU lock is a throttling point, as all accesses contend on it. Third is that
computeIfAbsent
is racy by allowing redundant computations with last insert wins. Fourth is that LRU is sub-optimal for search workloads, where frequency is a more insightful metric.The easiest (and biased) solution is to adopt Caffeine. Alternatively a big win would come from integrating TinyLFU into the existing cache. This would increase the hit rate, thereby reducing I/O and GC allocations, resulting in improved latencies.
The text was updated successfully, but these errors were encountered: