Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "all" namespace from managed namespaces #5187

Merged

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Dec 20, 2021

Don't append the 'all' namespaces when storage class validation is enabled, as it causes 'unknown namespace' errors in controller-runtime, and cluster-scoped resources are handled properly now in ctrl-runtime. This will supersede #5058, as it's a much simpler implementation to solve the same issue as the below PR better handles these types of situations in controller-runtime:

kubernetes-sigs/controller-runtime#1418

Tested with validation of storage class enabled, and successfully resized volumes from 100GB -> 200GB.

Known Issues

Validating webhooks for things such as ES PV resizing are created "unscoped"

- admissionReviewVersions:
  - v1beta1
  clientConfig:
    service:
      name: elastic-operator-webhook
      namespace: elastic-system
      path: /validate-elasticsearch-k8s-elastic-co-v1-elasticsearch
      port: 443
  failurePolicy: Ignore
  matchPolicy: Exact
  name: elastic-es-validation-v1.k8s.elastic.co
  namespaceSelector: {}
  objectSelector: {}
  rules:
  - apiGroups:
    - elasticsearch.k8s.elastic.co
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - elasticsearches
    scope: '*'    <--- Here

https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/

The scope field specifies if only cluster-scoped resources ("Cluster") or namespace-scoped resources ("Namespaced") will match this rule. "∗" means that there are no scope restrictions.

This means that the webhook[s] get resources across the whole cluster, in namespaces that they do not manage, which generates the dreaded

"unable to get: elastic/testing-es-masters because of unknown namespace for the cache"

There seems to be 2 approaches to handles this situation:

  1. Pass along managedNamespaces slice to webhook, and if the crud operation is outside of it's managed namespaces, allow it, which brings up the question: "How are the same webhook, with different names, all cluster scoped, and with the same rules (when having the operator installed twice in a cluster, and validating ES objects) handled? Do all have to validate, or just one?" (validated all webhooks with appropriate rules receive the request, and any can allow/deny)
  2. Work to get these webhooks Namespaced (which by looking at the operator framework docs, It wasn't clear how exactly to do this)

…abled, as

  it causes 'unknown namespace' errors in controller-runtime, and cluster-scoped
  resources are handled properly now in ctrl-runtime.
  see kubernetes-sigs/controller-runtime#1418
@botelastic botelastic bot added the triage label Dec 20, 2021
@naemono naemono added the >enhancement Enhancement of existing functionality label Dec 21, 2021
@botelastic botelastic bot removed the triage label Dec 21, 2021
@barkbay
Copy link
Contributor

barkbay commented Jan 4, 2022

This means that the webhook[s] get resources across the whole cluster, in namespaces that they do not manage, which generates the dreaded

I don't think it makes a lot of sense to deploy several "restricted" operators and expect one of them to validate resources at the cluster level. Using a single service to validate/mutate resources is a limitation of K8S itself.

Pass along managedNamespaces slice to webhook, and if the crud operation is outside of it's managed namespaces

👍 we could do that as a best effort. I'm not familiar with the scope field, I'm not sure to understand how it works.

@naemono
Copy link
Contributor Author

naemono commented Jan 4, 2022

👍 we could do that as a best effort. I'm not familiar with the scope field, I'm not sure to understand how it works.

The scope field of the webhook configuration allows a webhook to be namespace-scoped, instead of cluster scoped, which would be the right setting if we're managing a set of namespaces, but seems to indicate that the webhook would only work for the namespace that it's within.... I'll test this a bit more and update with my findings...

@pebrc
Copy link
Collaborator

pebrc commented Jan 12, 2022

As I understand the scope field it only helps restricting a rule in a webhook to match only on namespaced API resources or cluster scoped API resources, it does not help with what we are trying to do here which is untangle overlapping webhook instances.

The only mechanism the webhook API has to restrict responsibility to a namespace or multiple namespaces is the namespaceSelector property but that AFAIK relies on the namespaces being labeled with something we can select on which is not the case by default.

So maybe it would make sense to implement the little managedNamespaces hack @naemono suggested. Another option could be to have a different client for the webhook that is not namespace restricted but that might be a problem from a memory footprint perspective as I would assume we would then cache everything twice. We could also think about having a webhook only mode where one instance of the operator would just run the webhook and other instances would be namespace restricted and do the orchestration work.

@naemono
Copy link
Contributor Author

naemono commented Jan 13, 2022

Ok, I clearly made a mistake when rebasing this branch... going to try to fix this.... ignore this for now.

Updates webhook tests to ensure behavior.
@naemono naemono force-pushed the 4168-ignore-unmanaged-namespace-events-v2 branch from 2d216fa to c5c3819 Compare January 14, 2022 17:08
@naemono
Copy link
Contributor Author

naemono commented Jan 14, 2022

run full pr build

@naemono
Copy link
Contributor Author

naemono commented Jan 14, 2022

@pebrc, @barkbay I've updated this PR to include a change where the webhooks also know what namespaces are being managed, and ignore requests from namespaces that they don't manage. Let me know if this works. Thanks.

move log line down for consistency
@pebrc pebrc added the v2.1.0 label Jan 27, 2022
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to cover the other webhooks as well see my comments. Also something is wrong with GH web UI today I have a hard time leaving review comments that are not markup garbage, I hope I got them all fixed up now.

pkg/controller/elasticsearch/validation/webhook.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/validation/webhook.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/validation/webhook.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/validation/webhook.go Outdated Show resolved Hide resolved
cmd/manager/main.go Show resolved Hide resolved
naemono and others added 4 commits January 27, 2022 08:13
Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
  and to allow managedNamespaces in webhooks to be handled properly
  across all webhooks.
Adjust all managed objects to use new webhook package on setup.
@naemono
Copy link
Contributor Author

naemono commented Jan 27, 2022

run full pr build

…to query namespace.

Add logging when skipping resource validation.
Add additional information to 'reason' for allowing request.
@naemono
Copy link
Contributor Author

naemono commented Feb 8, 2022

I noticed the ci tests failing yesterday, and thought I resolved them all, but something is now failing in the e2e test. I'm investigating.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's almost there but the upgrade validation is not working correctly.

pkg/apis/elasticsearch/v1beta1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/agent/v1alpha1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/apm/v1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/apm/v1beta1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/apm/v1beta1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/kibana/v1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/kibana/v1beta1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/kibana/v1beta1/webhook.go Outdated Show resolved Hide resolved
pkg/apis/maps/v1alpha1/webhook.go Outdated Show resolved Hide resolved

if req.Operation == admissionv1.Update {
oldObj := v.validator.DeepCopyObject()
err = v.decoder.DecodeRaw(req.Object, oldObj)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = v.decoder.DecodeRaw(req.Object, oldObj)
err = v.decoder.DecodeRaw(req.OldObject, oldObj)

This is currently a NOOP validation comparing the new object with itself. This tells me that we are missing a unit test or if that's too complicated an e2e test to validate the webhook is actually validating what it is supposed to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this in the latest commit. I will add a unit, or e2e test to ensure properly functionality next. I misread the documentation for decoder.Decode, which state If you want decode the OldObject in the AdmissionRequest, use DecodeRaw....

I'll update when this test has been added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

Decode the old object in upgrade, not the original object.
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

want admission.Response
}{
{
"create properly validates valid agent, and returns allowed.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of using keys in these structs instead of positional initialisation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, and done. I'll merge this after ensuring all of the checks pass again.

@naemono naemono merged commit ecac71c into elastic:main Feb 14, 2022
@naemono naemono deleted the 4168-ignore-unmanaged-namespace-events-v2 branch February 14, 2022 14:31
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
* Don't append the 'all' namespaces when storage class validation is enabled, as
  it causes 'unknown namespace' errors in controller-runtime, and cluster-scoped
  resources are handled properly now in ctrl-runtime.
  see kubernetes-sigs/controller-runtime#1418

* Webhook ignores requests from namespaces that it doesn't manage
Updates webhook tests to ensure behavior.

* remove spaces in if err block
move log line down for consistency

* Move namespace validation within the Handle func to reduce duplication.

* ES spelling in pkg/controller/elasticsearch/validation/webhook.go

Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>

* ES spelling in pkg/controller/elasticsearch/validation/webhook.go

Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>

* Use set operations to enhance readability.

* Create new package to duplicate less code for setting up webhooks,
  and to allow managedNamespaces in webhooks to be handled properly
  across all webhooks.
Adjust all managed objects to use new webhook package on setup.

* Adjust common webhook to copy object properly, and use metav1.Object to query namespace.
Add logging when skipping resource validation.
Add additional information to 'reason' for allowing request.

* add missing header

* Check type assertion.

* Ensure that the set of managed namespaces isn't all before checking whether the given namespaces is managed.

* Simplify common webhook validation
Adjust how update is handled

* Remove debugging from webhook

* Proper casing in comments.
Decode the old object in upgrade, not the original object.

* Adding unit tests for webhook validation.

* Use keys in the webhook test structs

Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants