Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Allow Setting Cache Miss Policy in Cache Options #2397

Closed
stevekuznetsov opened this issue Jul 5, 2023 · 9 comments · Fixed by #2406
Closed

Proposal: Allow Setting Cache Miss Policy in Cache Options #2397

stevekuznetsov opened this issue Jul 5, 2023 · 9 comments · Fixed by #2406

Comments

@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented Jul 5, 2023

Today, when a user gets a client.Client from a Manager (with mgr.GetClient()), they get a cache-backed client that has some surprising behavior - when a cache miss occurs, an informer is spun up to feed data into the cache. This means that one client.Get() call for one resource will, by default, start a cluster-scoped watch on all objects of that type and keep them in memory forever. While this is an understandable way for the cache to work, I've seen it bite folks who did not expect this behavior. While I don't think that this should be the default, at a minimum I'd love to see the cache miss policy be configurable at creation time so that folks can opt out of this behavior. Today it is possible to opt out of caching particular resources, but that's done with a hard-coded deny-list, which means if you ever forget to update that list, you get bitten.

I'd like to propose adding a new field in the cache.Options:

// pkg/cache/cache.go

// Options are the optional arguments for creating a new InformersMap object.
type Options struct {
    // ...
    
    // MissPolicy determines how the cache should behave when a Get finds no
    // entry in the cache or a List captures no items. See the CacheMissPolicies for
    // documentation for each policy.
    MissPolicy CacheMissPolicy
}

type CacheMissPolicy string

const (
    // Forward configures the cache to forward a cache miss to the client, unchanged,
    // meaning that the client gets an errors.NotFound for a Get and an empty response
    // for a List.
    Forward CacheMissPolicy = "forward"

    // Backfill configures the cache to spin up an informer for a resource when the first
    // request for that resource comes in. This means that a Get or a List may take longer
    // than normal to succeed on the first invocation as the cache waits until it's fully back-
    // filled.
    Backfill CacheMissPolicy = "backfill"

    // LiveLookup configures the cache to issue a live client call to the server for requests
    // that do not correspond to resources that have been explicitly configured to be cached.
    LiveLookup CacheMissPolicy = "liveLookup"
)

We can keep the default as "backfill" to not change behavior.

/cc @alvaroaleman @vincepri

@stevekuznetsov
Copy link
Contributor Author

@alvaroaleman points out that LiveLookup would have weird semantics with configured label selectors. Likely better to remove that option.

Also, for Forward ... the user needs to be able to distinguish between "not found" and "no resource registered"... so, updated proposal:

type CacheMissPolicy string

const (
    // Fail configures the cache to return a sentinel error when a user requests a resource
    // the cache is not configured to hold. This error is distinct from an errors.NotFound
    // and should be considered a programming error.
    Fail CacheMissPolicy = "fail"

    // Backfill configures the cache to spin up an informer for a resource when the first
    // request for that resource comes in. This means that a Get or a List may take longer
    // than normal to succeed on the first invocation as the cache waits until it's fully back-
    // filled.
    Backfill CacheMissPolicy = "backfill"
)

Perhaps "fail" could also just runtime panic, since it's a programming error.

@stevekuznetsov
Copy link
Contributor Author

Perhaps "fail" could also just runtime panic, since it's a programming error.

Just realizing that the better proposal would be that the client.Client wrapping the cache could use that error to do a live lookup.

@vincepri
Copy link
Member

vincepri commented Jul 6, 2023

Perhaps "fail" could also just runtime panic, since it's a programming error.

Just realizing that the better proposal would be that the client.Client wrapping the cache could use that error to do a live lookup.

Which error?

@vincepri
Copy link
Member

vincepri commented Jul 6, 2023

Perhaps "fail" could also just runtime panic, since it's a programming error.

Maybe we could have a Strict mode, and have Backfill to be the default?

@stevekuznetsov
Copy link
Contributor Author

@vincepri

Which error?

I'm proposing:

    // Fail configures the cache to return a sentinel error when a user requests a resource
    // the cache is not configured to hold. This error is distinct from an errors.NotFound
    // and should be considered a programming error.
    Fail CacheMissPolicy = "fail"

This sounds like what you mean by Strict mode - yes?

@stevekuznetsov
Copy link
Contributor Author

So the flows for a client.Client with an underlying cache in CacheMissPolicy = Fail would be:

  1. query cache, get a hit -> return that value
  2. query cache, get a 404 -> return that 404
  3. query cache, get the ResourceNotCached, do a live call, return that

@stevekuznetsov
Copy link
Contributor Author

@vincepri @alvaroaleman do you think this is fleshed-out enough for me to attempt an implementation? Any other feedback?

@vincepri
Copy link
Member

Yeah I'm generally +1

@alvaroaleman
Copy link
Member

Yeah, sgtm

Just realizing that the better proposal would be that the client.Client wrapping the cache could use that error to do a live lookup.

That is a good idea, but in the interest of not having too much magic, maybe make it opt-in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants