Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix synchronous fetch count metric #1016

Open
turan18 opened this issue Dec 29, 2023 · 0 comments
Open

[Bug] Fix synchronous fetch count metric #1016

turan18 opened this issue Dec 29, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@turan18
Copy link
Contributor

turan18 commented Dec 29, 2023

Description

We increment a metric(SynchronousReadRegistryFetchCount) every time we need to fetch bytes of content synchronously. This isn't really indicative of registry fetches as we could have returned the content from our local cache. What we really need to know is the number of synchronous fetch operations that led to cache misses where we actually had to go over the network to get the requested bytes.

In a perfect scenario, this number should always be 0, meaning we should always have the content fetched and cached before the container needs it. In practice, this is almost always never the case, so looking at the metric should give us an idea of how well our background fetcher is able to fetch the needed content and an indication on how well optimizations like prioritized pre-fetching (eg: Load Order Documents #749) could benefit certain workloads.

Note: This isn't a perfect heuristic for determining the benefit of LOD as there could be several reasons that are unrelated to the ordering of span fetches as to why the background fetcher couldn't fetch and cache the requested content before hand. (eg: Maybe there were many sequential spans that the container needed and the background fetcher couldn't get to them in time or some transient failure occurred when trying to fetch/cache the content).

Steps to reproduce the bug

No response

Describe the results you expected

Ideally, we should increment this metric when we have to synchronously fetch the content over the network. We can move the metric increment to the end of getSpanContent. The metric itself is at the layer level and so it also requires the layer digest dimension, so we'll have to figure out a way to pass the layer digest into the span manager or just omit it entirely, although I think it may be helpful to be able to aggregate at a layer level.

Host information

  1. OS: N/A
  2. Snapshotter Version: v0.4.1
  3. Containerd Version: N/A

Any additional context or information about the bug

Related to: #749 and #225

@turan18 turan18 added the bug Something isn't working label Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant