Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace inmemory index cache to fastcache based implementation #5619

Merged
merged 4 commits into from
Nov 10, 2023

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Oct 25, 2023

What this PR does:

This PR adds a new inmemory index cache implementation, based on the https://github.com/VictoriaMetrics/fastcache library.

The previous inmemory index cache will be replaced by this one. As the previous inmemory index cache has pretty bad performance under high concurrency environment thanos-io/thanos#6762.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

}

c.added = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_store_index_cache_items_added_total",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still use thanos_ prefix to make those metrics compatible with the previous implementation.

Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this. Can you add some benchmarks tests ? The juicy numbers like https://github.com/VictoriaMetrics/fastcache#benchmarks are nice

@@ -455,6 +457,8 @@ github.com/alicebob/miniredis/v2 v2.30.4 h1:8S4/o1/KoUArAGbGwPxcwf0krlzceva2XVOS
github.com/alicebob/miniredis/v2 v2.30.4/go.mod h1:b25qWj4fCEsBeAAR2mlb0ufImGC6uH3VlUfb/HS5zKg=
github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible h1:9gWa46nstkJ9miBReJcN8Gq34cBFbzSpQZVVT9N09TM=
github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible/go.mod h1:T/Aws4fEfogEE9v+HPhhw+CntffsBHJ8nXQCwKr0/g8=
github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156 h1:eMwmnE/GDgah4HI848JfFxHt+iPb26b4zyfspmqY0/8=
github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156/go.mod h1:Cb/ax3seSYIx7SuZdm2G2xzfwmv3TPSk2ucNfQESPXM=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using a very old version of bigcache from 2019. Not saying it's wrong per se though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in fastcache library's benchmark https://github.com/VictoriaMetrics/fastcache/blob/master/fastcache_timing_test.go so should not be related.

@yeya24
Copy link
Contributor Author

yeya24 commented Oct 25, 2023

Sure, I can add some benchmark to compare it with the previous version of inmem cache.

@yeya24
Copy link
Contributor Author

yeya24 commented Nov 2, 2023

First benchmark test I added is to test StoreSeries and when the item is a small item and a large item (10MB).
Seems fastcache is better in terms of ns/op and allocation.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheStore/FastCache-10           24891123               799.4 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           24816021               749.4 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           24405404               771.8 ns/op           132 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           25967481               724.1 ns/op           131 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCache-10           25629028               725.1 ns/op           131 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15269642              1235 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         14930562              1250 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15100389              1245 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         15138560              1239 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCache-10         14924134              1239 ns/op            1377 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16778           1068707 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16767           1065762 ns/op             150 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16843           1078558 ns/op             150 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16693           1074622 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/FastCacheLargeItem-10             16756           1094270 ns/op             151 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           50761            343347 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           53716            323179 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           57980            314176 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           58236            310593 ns/op        10486125 B/op         16 allocs/op
BenchmarkInMemoryIndexCacheStore/ThanosCacheLargeItem-10           59007            313004 ns/op        10486125 B/op         16 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        449.797s

Updated: it seems that large items are not that good for fast cache because in the test it needs to evict. Causing higher latency in this case. But look at the test results below with higher concurrency it is better.

@yeya24
Copy link
Contributor Author

yeya24 commented Nov 2, 2023

Tested store inmem cache with 500 concurrency. This is common scenario in store gateway. Fastcache seems better.

-------- BEGIN BENCHMARK --------
goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13901102              1332 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13758361              1313 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13570058              1303 ns/op             154 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13834746              1306 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCache-10                 13665320              1307 ns/op             153 B/op          4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9695043              1891 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9759580              1889 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9206676              2183 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9649276              1966 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCache-10                9281004              2025 ns/op            1378 B/op         15 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  108466            169077 ns/op             174 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  110389            162944 ns/op             173 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  115863            160414 ns/op             171 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  112195            163320 ns/op             173 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/FastCacheLargeItem-10                  114378            162164 ns/op             171 B/op            4 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 52624            326619 ns/op        10485350 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 50662            333963 ns/op        10485318 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55876            330605 ns/op        10485582 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55425            327196 ns/op        10485389 B/op           16 allocs/op
BenchmarkInMemoryIndexCacheStoreConcurrent/ThanosCacheLargeItem-10                 55989            330950 ns/op        10485210 B/op           16 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        408.697s

@yeya24
Copy link
Contributor Author

yeya24 commented Nov 2, 2023

Read benchmark with single thread fetch. Fastcache is worse than Thanos cache. Fastcache has much higher mem allocations because it has a sync pool internally when fetching data, due to its implementation details.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3568           4964243 ns/op        11950400 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3814           4721955 ns/op        11950460 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3788           4713234 ns/op        11950367 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3850           4704868 ns/op        11950337 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/FastCache-10               3830           4811776 ns/op        11950428 B/op      29942 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14553           1238019 ns/op         1228057 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14523           1238814 ns/op         1228063 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14430           1234672 ns/op         1228031 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14493           1233570 ns/op         1228015 B/op         53 allocs/op
BenchmarkInMemoryIndexCacheFetch/ThanosCache-10            14563           1231709 ns/op         1228017 B/op         53 allocs/op

Read benchmark with 500 concurrent fetch. This is more close to real scenario where multiple requests sent to Store Gateway and accessing inmem cache at the same time. Even though single fetch request fastcache is slower than Thanos cache, under high concurrency scenario it is 2X faster due to its stripped lock. So overall I think its query performance is not as bad as it looks like.

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/storage/tsdb
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2462901 ns/op        11640514 B/op      29142 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2559408 ns/op        11718839 B/op      29336 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2487506 ns/op        11675420 B/op      29229 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2502027 ns/op        11658252 B/op      29184 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/FastCache-10                    10000           2550726 ns/op        11853151 B/op      29675 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4597873 ns/op         1210321 B/op         52 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4851311 ns/op         1239506 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4828700 ns/op         1238629 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4828068 ns/op         1237347 B/op         54 allocs/op
BenchmarkInMemoryIndexCacheFetchConcurrent/ThanosCache-10                  10000           4812511 ns/op         1237703 B/op         54 allocs/op
PASS
ok      github.com/cortexproject/cortex/pkg/storage/tsdb        370.887s

Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 merged commit 9dc9eda into cortexproject:master Nov 10, 2023
14 checks passed
@yeya24 yeya24 deleted the add-fastcache branch November 10, 2023 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants