Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some admission control support for validating local issues #2742

Closed
bgagnon opened this issue Jul 28, 2020 · 10 comments
Closed

Some admission control support for validating local issues #2742

bgagnon opened this issue Jul 28, 2020 · 10 comments
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@bgagnon
Copy link
Contributor

bgagnon commented Jul 28, 2020

For user errors than can be validated without resolving other nodes in the DAG, admission control would provide a better user experience than the current asynchronous system of flagging the proxy object as "invalid".

Some validation points require late binding and couldn't be covered here:

  • TLS secret existence or validity of contents
  • upstream service existence
  • inclusions and delegations
  • etc.

A quick list of validation points not covered by the JSON schema but still local to a single object:

cc @stevesloka

@bgagnon
Copy link
Contributor Author

bgagnon commented Jul 28, 2020

Also, I think a DAG-aware admission controller is possible if Contour itself is serving the admission controller webhook, but I'm not sure it's a good idea to force DAG nodes to be created in a topological order to avoid admission errors. This isn't true of normal Kubernetes objects in general (eg. Service and Pod can be created in any order and eventually find each other).

@jpeach
Copy link
Contributor

jpeach commented Jul 28, 2020

How far can you get with https://github.com/open-policy-agent/gatekeeper ?

I expect gatekeeper can do an OK job of these.

  • malformed or illogical routes/services stanzas

This probably needs CRD v1 #2678. Note that CRD v1 can be done today for people running Kubernetes > 1.16 (I had it in my old kustomize PR).

@jpeach jpeach added the area/deployment Issues or PRs related to deployment tooling or infrastructure. label Jul 29, 2020
@youngnick
Copy link
Member

I think that admission control of some sort is inevitable. It's so much better a user experience to be made aware of invalid objects as soon as possible.

The details we need to figure out are:

What admission control do we use?

As @jpeach says, many use cases could be hit using Gatekeeper, and that would also give advanced users the flexibility of customising the admission control to their liking.

A DAG-aware admission controller would need to run as part of Contour, I agree, in a similar way to what kubebuilder controllers now do. I think this is much less necessary, given that we (well, I) will implement Conditions support into status, which should at least provide a way to surface the same information.

When do we do this?

Again, the advantage of using a general admisson controller like Gatekeeper (and providing sample policies) is that we will be able to deliver some value sooner.

Building something into Contour will take longer.

This is a great discussion for us to have one community meeting! @bgagnon, if you could make a community meeting, you can give us a shout, and we will add an item to the agenda.

@jpeach
Copy link
Contributor

jpeach commented Aug 4, 2020

A DAG-aware admission controller would need to run as part of Contour, I agree, in a similar way to what kubebuilder controllers now do. I think this is much less necessary, given that we (well, I) will implement Conditions support into status, which should at least provide a way to surface the same information.

IIUC, you can't really do admission control against more than one resource, so there's no real benefit to being DAG aware.

@stevesloka
Copy link
Member

so there's no real benefit to being DAG aware

I think there is a benefit. Once you have those admission bits in front you're able to stop someone from breaking a chain of proxies by being able to find issues up front and handle them better.

This would be a very big change to Contour that needs thought out carefully.

@jpeach
Copy link
Contributor

jpeach commented Aug 4, 2020

so there's no real benefit to being DAG aware

I think there is a benefit. Once you have those admission bits in front you're able to stop someone from breaking a chain of proxies by being able to find issues up front and handle them better.

But that would only work if you could be sure you had all the proxies you needed at validation time. My understanding was that you could not depend on that. If you could, then validation would also enforce some sort of ordering constraints, which could be awkward.

Maybe a more general approach to this problem is to figure out how Contour can keep existing configurations working if they get modified in a way that breaks them. That would improve operational safety in the general case.

This would be a very big change to Contour that needs thought out carefully.

Yup 👍

@bgagnon
Copy link
Contributor Author

bgagnon commented Aug 5, 2020

@youngnick I brought this up in the last community meeting, actually.

There was a discussion about the upcoming timeout range feature and it reminded me of LimitRange, ResourceQuota and PodSecurityPolicy objects which are enforced at admission time.

Arguably, Kubernetes itself is not perfect with those due to the asynchronous admission of Deployment -> ReplicaSet -> Pod.

On our side, we've been considering adding admission control hooks for Contour via our controller built on controller-runtime that handles our custom DNS and TLA provisioning. We have environment specific constraints on top of what Contour allows.

This portion could be done with OPA and Gatekeeper but it might be a little silly considering we can just add a few lines of Go to our existing controller.

Anyhow, it feels to me like a portion (perhaps very small) of the validation needed in Contour could be done at admission, leaving the more complex bits to the eventually consistent DAG evaluator.

I should mention this reflection happened in the context of an old 1.14 cluster (now 1.15!) that does not even validate CRD payloads.

With OpenAPI / JSON schema, it's perhaps less necessary, but there are still validations influenced by Contour configs and CLI flags.

If that's a growing trend in the project, it's worth considering, IMO.

@youngnick
Copy link
Member

Yes, I think that moving as much simple validation as possible as early in the object apply process as possible is the best UX we can get.

That is:

  • using OpenAPI definitions to ensure valid objects (requires Generate Contour CRDs with V1 CustomResourceDefinition. #2678), then
  • having admission control validate that the object is internally consistent and compliant with any required policy (this issue). This could be Gatekeeper or other option; we discussed Gatekeeper because it seems like it could probably hit a lot of the "limit settings to a range of values" cases very easily. The other thing I like is that it allows the policy enforcement to be pluggable; users can dial it up or down as they like (or add it to an existing admission controller if they want as well).
  • The eventually consistent DAG run then follows along after and ensures everything is good once all the pieces are together.

Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants