Replace inmemory index cache to fastcache based implementation #5619

yeya24 · 2023-10-25T05:34:38Z

What this PR does:

This PR adds a new inmemory index cache implementation, based on the https://github.com/VictoriaMetrics/fastcache library.

The previous inmemory index cache will be replaced by this one. As the previous inmemory index cache has pretty bad performance under high concurrency environment thanos-io/thanos#6762.

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

yeya24 · 2023-10-25T05:37:07Z

pkg/storage/tsdb/inmemory_index_cache.go

+	}
+
+	c.added = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
+		Name: "thanos_store_index_cache_items_added_total",


We still use thanos_ prefix to make those metrics compatible with the previous implementation.

friedrichg

I am ok with this. Can you add some benchmarks tests ? The juicy numbers like https://github.com/VictoriaMetrics/fastcache#benchmarks are nice

friedrichg · 2023-10-25T09:20:03Z

go.sum

@@ -455,6 +457,8 @@ github.com/alicebob/miniredis/v2 v2.30.4 h1:8S4/o1/KoUArAGbGwPxcwf0krlzceva2XVOS
 github.com/alicebob/miniredis/v2 v2.30.4/go.mod h1:b25qWj4fCEsBeAAR2mlb0ufImGC6uH3VlUfb/HS5zKg=
 github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible h1:9gWa46nstkJ9miBReJcN8Gq34cBFbzSpQZVVT9N09TM=
 github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible/go.mod h1:T/Aws4fEfogEE9v+HPhhw+CntffsBHJ8nXQCwKr0/g8=
+github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156 h1:eMwmnE/GDgah4HI848JfFxHt+iPb26b4zyfspmqY0/8=
+github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156/go.mod h1:Cb/ax3seSYIx7SuZdm2G2xzfwmv3TPSk2ucNfQESPXM=


This is using a very old version of bigcache from 2019. Not saying it's wrong per se though.

It is used in fastcache library's benchmark https://github.com/VictoriaMetrics/fastcache/blob/master/fastcache_timing_test.go so should not be related.

yeya24 · 2023-10-25T15:40:30Z

Sure, I can add some benchmark to compare it with the previous version of inmem cache.

yeya24 · 2023-11-02T05:47:25Z

First benchmark test I added is to test StoreSeries and when the item is a small item and a large item (10MB).
Seems fastcache is better in terms of ns/op and allocation.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheStore/FastCache-10           24891123               799.4 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           24816021               749.4 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           24405404               771.8 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           25967481               724.1 ns/op           131 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           25629028               725.1 ns/op           131 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15269642              1235 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         14930562              1250 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15100389              1245 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15138560              1239 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         14924134              1239 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16778           1068707 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16767           1065762 ns/op             150 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16843           1078558 ns/op             150 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16693           1074622 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16756           1094270 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           50761            343347 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           53716            323179 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           57980            314176 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           58236            310593 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           59007            313004 ns/op        10486125 B/op         16 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        449.797s

Updated: it seems that large items are not that good for fast cache because in the test it needs to evict. Causing higher latency in this case. But look at the test results below with higher concurrency it is better.

yeya24 · 2023-11-02T06:21:21Z

Tested store inmem cache with 500 concurrency. This is common scenario in store gateway. Fastcache seems better.

-------- BEGIN BENCHMARK --------
goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13901102              1332 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13758361              1313 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13570058              1303 ns/op             154 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13834746              1306 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13665320              1307 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9695043              1891 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9759580              1889 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9206676              2183 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9649276              1966 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9281004              2025 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  108466            169077 ns/op             174 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  110389            162944 ns/op             173 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  115863            160414 ns/op             171 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  112195            163320 ns/op             173 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  114378            162164 ns/op             171 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 52624            326619 ns/op        10485350 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 50662            333963 ns/op        10485318 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55876            330605 ns/op        10485582 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55425            327196 ns/op        10485389 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55989            330950 ns/op        10485210 B/op           16 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        408.697s

yeya24 · 2023-11-02T07:25:40Z

Read benchmark with single thread fetch. Fastcache is worse than Thanos cache. Fastcache has much higher mem allocations because it has a sync pool internally when fetching data, due to its implementation details.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3568           4964243 ns/op        11950400 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3814           4721955 ns/op        11950460 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3788           4713234 ns/op        11950367 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3850           4704868 ns/op        11950337 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3830           4811776 ns/op        11950428 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14553           1238019 ns/op         1228057 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14523           1238814 ns/op         1228063 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14430           1234672 ns/op         1228031 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14493           1233570 ns/op         1228015 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14563           1231709 ns/op         1228017 B/op         53 allocs/op

Read benchmark with 500 concurrent fetch. This is more close to real scenario where multiple requests sent to Store Gateway and accessing inmem cache at the same time. Even though single fetch request fastcache is slower than Thanos cache, under high concurrency scenario it is 2X faster due to its stripped lock. So overall I think its query performance is not as bad as it looks like.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2462901 ns/op        11640514 B/op      29142 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2559408 ns/op        11718839 B/op      29336 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2487506 ns/op        11675420 B/op      29229 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2502027 ns/op        11658252 B/op      29184 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2550726 ns/op        11853151 B/op      29675 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4597873 ns/op         1210321 B/op         52 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4851311 ns/op         1239506 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4828700 ns/op         1238629 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4828068 ns/op         1237347 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4812511 ns/op         1237703 B/op         54 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        370.887s

friedrichg

nice!

Signed-off-by: Ben Ye <benye@amazon.com>

pull-request-size bot added the size/L label Oct 25, 2023

yeya24 commented Oct 25, 2023

View reviewed changes

friedrichg approved these changes Oct 25, 2023

View reviewed changes

yeya24 force-pushed the add-fastcache branch from 4dfd6e2 to 69ee5d9 Compare November 2, 2023 04:35

pull-request-size bot added size/XL and removed size/L labels Nov 2, 2023

yeya24 force-pushed the add-fastcache branch from 5335cbd to 68632ce Compare November 2, 2023 07:28

yeya24 requested a review from friedrichg November 2, 2023 07:32

friedrichg approved these changes Nov 2, 2023

View reviewed changes

alanprot approved these changes Nov 9, 2023

View reviewed changes

yeya24 added 4 commits November 10, 2023 13:00

replace inmemory index cache to fastcache based implementation

92d4d18

Signed-off-by: Ben Ye <benye@amazon.com>

changelog

f60f99a

Signed-off-by: Ben Ye <benye@amazon.com>

add benchmarks

24450dc

Signed-off-by: Ben Ye <benye@amazon.com>

fix conflicts

eac871c

Signed-off-by: Ben Ye <benye@amazon.com>

yeya24 force-pushed the add-fastcache branch from 68632ce to eac871c Compare November 10, 2023 21:02

yeya24 merged commit 9dc9eda into cortexproject:master Nov 10, 2023
14 checks passed

yeya24 deleted the add-fastcache branch November 10, 2023 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace inmemory index cache to fastcache based implementation #5619

Replace inmemory index cache to fastcache based implementation #5619

yeya24 commented Oct 25, 2023 •

edited

Loading

yeya24 Oct 25, 2023

friedrichg left a comment

friedrichg Oct 25, 2023

yeya24 Oct 25, 2023

yeya24 commented Oct 25, 2023

yeya24 commented Nov 2, 2023 •

edited

Loading

yeya24 commented Nov 2, 2023

yeya24 commented Nov 2, 2023

friedrichg left a comment

Replace inmemory index cache to fastcache based implementation #5619

Replace inmemory index cache to fastcache based implementation #5619

Conversation

yeya24 commented Oct 25, 2023 • edited Loading

yeya24 Oct 25, 2023

Choose a reason for hiding this comment

friedrichg left a comment

Choose a reason for hiding this comment

friedrichg Oct 25, 2023

Choose a reason for hiding this comment

yeya24 Oct 25, 2023

Choose a reason for hiding this comment

yeya24 commented Oct 25, 2023

yeya24 commented Nov 2, 2023 • edited Loading

yeya24 commented Nov 2, 2023

yeya24 commented Nov 2, 2023

friedrichg left a comment

Choose a reason for hiding this comment

yeya24 commented Oct 25, 2023 •

edited

Loading

yeya24 commented Nov 2, 2023 •

edited

Loading