Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Low cache hit ratio for metric_name cache #1356

Closed
paulfantom opened this issue May 12, 2022 · 5 comments · Fixed by #1498
Closed

Low cache hit ratio for metric_name cache #1356

paulfantom opened this issue May 12, 2022 · 5 comments · Fixed by #1498
Assignees
Labels
Bug Something isn't working Performance Improvements that are specifically related to performance

Comments

@paulfantom
Copy link
Contributor

It appears that metric_name cache is not reaching high levels of cache hits. To my knowledge cache hit ratios for postgres should be usually in high 90s and we are experiencing a hit ratio of ~84%.

Screenshot from 2022-05-12 12-03-36

@paulfantom paulfantom added Bug Something isn't working Performance Improvements that are specifically related to performance labels May 12, 2022
@JamesGuthrie
Copy link
Contributor

JamesGuthrie commented May 12, 2022

Do you have statistics on cache evictions? Or cache fullness?

@paulfantom
Copy link
Contributor Author

paulfantom commented May 12, 2022

This is from our demo environment to which team has access :)

There are no cache evictions from metrics_name cache and occupancy is at 15%.

@paulfantom
Copy link
Contributor Author

paulfantom commented Jul 5, 2022

Bumping this as it starts to be a problem in very basic deployments. Just after 6h of running default tobs installation (around 80k active series) we get a PromscaleCacheTooSmall alert firing due to a problem described in this issue. Since it is not acceptable to ship a default installation that is immediately firing alerts it would be good to raise the priority of this investigation. However, if there is no fix or time to investigate, we'll need to consider dropping this alert or excluding metrics_name cache from the alert expression.

cc'ing @VineethReddy02 for visibility

@JamesGuthrie
Copy link
Contributor

As discussed on slack:

The PromscaleCacheTooSmall alert uses cache hit ratio as a proxy for "cache too small", i.e. hit rate < 0.9 = cache too small. It doesn't actually look at the cache occupancy. For caches with very low activity (of which the metric_name cache is one), the calculated hit ratio does not seem to be correct/useful.

Both the PromscaleCacheTooSmall and PromscaleCacheHighNumberOfEvictions alerts are measuring the same outcome: the caches have been configured to be too small. If the cache is too small, we expect there to be a high number of evictions.

We could either augment the PromscaleCacheTooSmall to actually take cache occupancy into account. If we did so, it's likely that both PromscaleCacheTooSmall and PromscaleCacheHighNumberOfEvictions would fire in similar situations.

@arajkumar
Copy link
Member

I can work on this.

@arajkumar arajkumar self-assigned this Jul 13, 2022
arajkumar added a commit to arajkumar/promscale that referenced this issue Jul 18, 2022
This commit removes the alert PromscaleCacheTooSmall which is flaky and fires during same cause as `PromscaleCacheHighNumberOfEvictions`.

Fixes timescale#1356.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
arajkumar added a commit to arajkumar/promscale that referenced this issue Jul 18, 2022
This commit removes the alert PromscaleCacheTooSmall which is flaky and fires during same cause as `PromscaleCacheHighNumberOfEvictions`.

Fixes timescale#1356.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
arajkumar added a commit to arajkumar/promscale that referenced this issue Jul 18, 2022
This commit removes the alert PromscaleCacheTooSmall which is flaky and fires during same cause as `PromscaleCacheHighNumberOfEvictions`.

Fixes timescale#1356.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
arajkumar added a commit that referenced this issue Jul 20, 2022
This commit removes the alert PromscaleCacheTooSmall which is flaky and fires during same cause as `PromscaleCacheHighNumberOfEvictions`.

Fixes #1356.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
@paulfantom paulfantom reopened this Oct 25, 2022
@paulfantom paulfantom closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Something isn't working Performance Improvements that are specifically related to performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants