Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the filtered cache mechanism #1715

Closed
chrispatmore opened this issue Nov 12, 2021 · 16 comments
Closed

Enhance the filtered cache mechanism #1715

chrispatmore opened this issue Nov 12, 2021 · 16 comments

Comments

@chrispatmore
Copy link

I have been playing around with the filtered caching mechanism added in controller-runtime v0.9.0, and have found that it is not quite sufficient for all of my use cases.

It works great if I am trying to filter down the resources with one rule, i.e. only show me configmaps where the label foo has value bar

selectors := cache.SelectorsByObject{
    &corev1.Pod{}:                   {
        Label: labels.SelectorFromSet(labels.Set{
            "app.kubernetes.io/managed-by": "my-operator",
        })
    }
}

However as far as I can tell it is not possible to create selectors with any form of logical OR.

I cannot say for example, find any configmaps where label foo has value bar OR label x has value y OR metadata.name has value my-configmap

This means that I cannot access any configmap that does not meet my one allowed rule, and therefore my only option if I need to do that is have no cache rule for configmaps, which defeats the point of having it.

It would be ideal if some form of OR functionality could be added, for example by changing the mapping of a selector from
map[client.Object]internal.Selector to map[client.Object][]internal.Selector
where each item in the array is applied as an OR

@FillZpp
Copy link
Contributor

FillZpp commented Nov 15, 2021

@chrispatmore I'm afraid there is no way to change map[client.Object]internal.Selector to map[client.Object][]internal.Selector. When the cache starts to listwatch, it converts the object selector to string and the list selector in options here and it could not configure with a list of selector.

@chrispatmore
Copy link
Author

That was merely a suggested way of achieving OR based conditions in the cache, if it will not work then another solution that provides the same capability is just as good.

The fundamental issue is the need for a way of including filtering with ORs

@alvaroaleman
Copy link
Member

alvaroaleman commented Nov 15, 2021

hy @chrispatmore as @FillZpp mentioned it is not possible to use a list of selectors as the Kube-Apiserver does not support that. You can however use Set-Based requirements to achieve an OR, this already works today.

req, err := labels.NewRequirement("app", selection.In, []string{"app1", "app2"})`
if err != nil { return err }
selector := labels.NewSelector().Add(req)

@alvaroaleman
Copy link
Member

Closing for now, please feel free to re-open if the above does not answer your question.

@chrispatmore
Copy link
Author

I don't seem to have permissions to reopen this issue.

This does not answer my question because as far as I can tell, the use of Requirement above does not actually get recognised by the cache

It only supports Equals

If I did anything but

labels.Set{
            "app.kubernetes.io/managed-by": "my-operator",
        })

the cache did not work.
I had a look around and it appeared that the cache was only recognising equals

Also even if the above did work
it would still not meet my needs as the above requirement is saying that one label can have values a or b however that is not effective if the label itself is not common across the two items.

i.e. if I want "app": "a" OR "name": "a"

@chrispatmore
Copy link
Author

I will try and find again the code that seemed to explain why requirements weren't working, but as I said it still wouldnt solve my problem

@alvaroaleman
Copy link
Member

If the requirements are not working that is a bug and I'd be happy to merge a bugfix for that. Please open a new issue if you are sure this doesn't work.

Supporting something other than what can be expressed through requirements (e.G. OR across different label keys) is fundamentally not possible as the kube-apiserver does not support it.

@chrispatmore
Copy link
Author

I will double check my experience with Requirements, and take another look through the code

I have been using this cache
https://github.com/IBM/controller-filtered-cache/blob/main/docs/create-filtered-cache.md
Whilst I was waiting for controller-runtime to implement it's own filtered cache which it now has, and ideally I would like to use.

This cache behaves fundamentally differently in that you can still retrieve an item even if it does not get cached (I prefer the controller-runtime behaviour that blocks the get here as it is much clearer what is cached and what isn't). And it allows ORs at the level I am interested in
It does not support the level of granularity expressed by Requirements

I do not know the specifics of how it works, but if you are saying that the cache in controller-runtime will not be able to support my requirement I guess I will have to stick with the above cache

I will still have a look over Requirements and check that what I saw was accurate

@alvaroaleman
Copy link
Member

I do not know the specifics of how it works, but if you are saying that the cache in controller-runtime will not be able to support my requirement I guess I will have to stick with the above cache

The way they support that is by creating multiple watches and then iterating over them. IMHO that is pretty gross and the kind of problem you should not attempt to solve but instead attempt to not have by labeling your similar resources with the same key.

@chrispatmore
Copy link
Author

chrispatmore commented Nov 15, 2021

I agree, but when you consume a number of operators as dependents this ideal falls down, for example, we have logic that requires we access the values in the secrets created by the cert-manager operator, the secrets created by that operator do not have any common labels added, which means we either have to turn off filtering for secrets one of the most populous resources, or we cannot access the secret.

I accept that the ideal solution is to have the cert-manager operator propagate labels we need / add the more standard labels, but ensuring all your dependencies support that is sometimes not achievable

I am happy if you want to keep this filter cache mechanism fundamentally purer in terms of implementation, and in that case am happy to have this issue remain closed. And as I say I will have to stick with what I have until we can get the ability to control all of our dependents labels

@chrispatmore
Copy link
Author

Requirements seems to be working fine now, no idea what I was doing wrong before

@chrispatmore
Copy link
Author

@alvaroaleman Hi I have been thinking about this a bit more and had a follow up question I was hoping I could get your thoughts on, without raising a new issue.

I agree with you on not adding the higher level OR logic on the cache, however I still think there are cases where an operator will need to access (not manage or own or watch) resources that it doesn't really need to cache. I was proposing the OR as a means to allow me to access resources that did not meet the rules for my cache, by adding extra rules such that I could access them, (owing to the fact that the client cannot get a resource not in the cache). But it occurred to me that an equally viable solution for my issue would be to allow the client to get the resources without going to the cache or caching the resources at all

Do you think it would be possible to add an override / configuration option to the cache or client that would allow the client to get the resource directly from the cluster if it is not cached? Or is there a mechanism already available for this?

@chrispatmore
Copy link
Author

Or @FillZpp if you have any thoughts?

@alvaroaleman
Copy link
Member

Do you think it would be possible to add an override / configuration option to the cache or client that would allow the client to get the resource directly from the cluster if it is not cached?

In general this is a valid usecase, for example in situations where you want to use CreateOrUpdate and have a cache with a labelselector (as external actor removing label means CreateOrUpdate breaks, as it tries to Create which fails as it already exists). Today, there isn't support for something like this. Off of the top of my head I am not completely sure what the best approach for this would look like, though. The basic problem is that we can not distinguish "not cached" from "doesn't exist" based on the cache, as from the POV of the cache those two are the same. So the naive approach of falling back would mean that we do an api request every time a 404 is returned from the cache, which seems expensive.

@chrispatmore
Copy link
Author

It's only expensive if you are trying to get something that doesn't exist though right, so as long as you aren't spamming requests to retrieve an item that doesn't exist the cost will be the same, as it's either there and in the cache (no different) or it's there and you get it from the server directly which is useful

@chrispatmore
Copy link
Author

Any actions to take on this? Should I raise a new issue around what we discussed above for adding the ability to access resources that are not cached?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants