Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Cache label IDs #1414

Merged
merged 1 commit into from
Jun 20, 2022
Merged

Cache label IDs #1414

merged 1 commit into from
Jun 20, 2022

Conversation

niksajakovljevic
Copy link
Contributor

Add inverted cache ( (metric + label pair) -> (id, pos)) to avoid DB calls for
fetching label IDs in cases when series ID is not cached.
This cache is only used on metric ingestion.
Benchmarks are showing around 5-10% gains in ingest performance and
about 25% less DB calls for fetching label IDs (note that these numbers
depend a lot on a shape of the dataset).

@niksajakovljevic niksajakovljevic added the Performance Improvements that are specifically related to performance label Jun 6, 2022
@niksajakovljevic niksajakovljevic self-assigned this Jun 6, 2022
@niksajakovljevic niksajakovljevic requested review from paulfantom and a team as code owners June 6, 2022 10:19
@niksajakovljevic niksajakovljevic force-pushed the niksa/cache-label-ids branch 2 times, most recently from ce7ae00 to 23147d4 Compare June 6, 2022 14:04
@niksajakovljevic
Copy link
Contributor Author

Closes #1392

@cevian
Copy link
Contributor

cevian commented Jun 6, 2022

@niksajakovljevic have you considered using (label pair)=> (id, map[metric_name]=>pos) instead? That would allow reusing the existing cache and thus improve overall cache hit ratio?

Copy link
Member

@Harkishen-Singh Harkishen-Singh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not using clockcache?

pkg/pgclient/config.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/inverted_labels_cache.go Show resolved Hide resolved
pkg/pgmodel/cache/inverted_labels_cache.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/inverted_labels_cache.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/inverted_labels_cache.go Outdated Show resolved Hide resolved
pkg/pgmodel/ingestor/series_writer.go Show resolved Hide resolved
pkg/pgmodel/ingestor/series_writer.go Outdated Show resolved Hide resolved
@niksajakovljevic
Copy link
Contributor Author

@niksajakovljevic have you considered using (label pair)=> (id, map[metric_name]=>pos) instead? That would allow reusing the existing cache and thus improve overall cache hit ratio?

Labels reader cache is actually inverse id -> label pair so we can't reuse it.

for _, cachedLabel := range info.cachedLabels {
if val, ok := labelMap[cachedLabel]; ok {
if int(val.Pos) > info.maxPos {
info.maxPos = int(val.Pos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this when fetching from cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we have to do it. Since some labels are cached and we need maxPos to be correct (meaning it contains the max for both fetched and cached labels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant, can you start calculating the max position when fetching cached entries, so you don't have to iterate again through the cached labels.

pkg/pgmodel/ingestor/series_writer.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/flags.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/flags.go Outdated Show resolved Hide resolved
pkg/pgmodel/cache/inverted_labels_cache.go Outdated Show resolved Hide resolved
Add inverted cache ( (metric + label pair) -> (id, pos)) to avoid DB calls for
fetching label IDs in cases when series ID is not cached.
This cache is only used when ingesting data.
Benchmarks are showing around 5-10% gains in ingest performance and
about 25% less DB calls for fetching label IDs (note that these numbers
depend a lot on a shape of the dataset).
@niksajakovljevic niksajakovljevic merged commit 8ba45d2 into master Jun 20, 2022
@niksajakovljevic niksajakovljevic deleted the niksa/cache-label-ids branch June 20, 2022 11:47
@peppercoffee peppercoffee added the IIP-1 Improve Ingestion Performance (part 1) label Jul 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
IIP-1 Improve Ingestion Performance (part 1) Performance Improvements that are specifically related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants