-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for cache entry memory size #5770
Conversation
These are: * currently unpopulated. * needs unit tests. * needs to be propagated when a new in memory cache is spawned they are correct.
This comment has been minimized.
This comment has been minimized.
CI performance tests
|
} | ||
|
||
fn serialize_none(self) -> Result<Self::Ok, Self::Error> { | ||
Ok(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check: None
would still use the size of Some
in memory, right?
We estimate it as 0 because of serve limitations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess in theory we can do something better. But None values won't have strings etc in them so I don't think we should worry about this and maybe we can follow up later to improve things.
apollo-router/src/cache/storage.rs
Outdated
kind = %self.caller, | ||
storage = &tracing::field::display(CacheStorageName::Memory), | ||
); | ||
if let Some((_, v)) = in_memory.push(key.clone(), value.clone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the in memory cache lock is held until the end of the function, it should only be held when inserting the old entry.
Maybe fixed with something like:
let old_entry = {
self.inner.lock().await.push(key.clone(), value.clone())
};
if let Some((_, v)) = old_entry {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in: e37c87e
|
||
As the size is only an estimation, users should check for correlation with pod memory usage to determine if cache needs to be updated. | ||
|
||
Usage scenario: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shorgi Maybe this info should also be in the docs under cache troubleshooting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great content for docs. Since it's related to pod memory pressure, moving it to the k8s page.
Going to wait for a docs review before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the usage scenario from the changeset into docs
|
||
As the size is only an estimation, users should check for correlation with pod memory usage to determine if cache needs to be updated. | ||
|
||
Usage scenario: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great content for docs. Since it's related to pod memory pressure, moving it to the k8s page.
Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Co-authored-by: Edward Huang <edward.huang@apollographql.com>
Query planner cache entries may use an significant amount of memory in the Router.
To help users understand and monitor this the Router now exposes a new metric
apollo.router.cache.storage.estimated_size
.This metric give an estimated size in bytes for the cache entry and has the following attributes:
kind
:query planner
.storage
:memory
.As the size is only an estimation, users should check for correlation with pod memory usage to determine if cache needs to be updated.
Usage scenario:
apollo.router.cache.storage.estimated_size
.apollo_router_cache_size
.apollo_router_cache_hits
-apollo_router_cache_misses
.apollo.router.cache.storage.estimated_size
to see if it grows over time and correlates with pod memory usage.Remediation:
Technical info
The estimate is only implemented for query plans and uses serde to create the estimate without actually writing to a string. It is very difficult to provide accurate memory usage, so it's only useful to see a trend.
Providing an estimated size for a cache entry is optional, and only implemented for query planner at this time.
Where no estimated size is set the gauge will not be emitted, e.g. APQ or entity caching.
Testing
For manual testing we will deploy internally and observe that the metric tops out when the query cache is full.
ROUTER-669
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩