Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: [sig-cluster-lifecycle] addons via operators #746

Merged
merged 3 commits into from
Mar 26, 2019

Conversation

justinsb
Copy link
Member

No description provided.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. label Jan 28, 2019
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/pm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. label Jan 28, 2019
@justinsb
Copy link
Member Author

cc @luxas @roberthbailey @timothysc

Got it live before the sig-cluster-lifecycle meeting... just!

@kubernetes/sig-cluster-lifecycle-pr-reviews

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thanks @justinsb

added some comments for discussion.
nothing blocking.

/assign @timothysc @fabriziopandini @luxas


## Implementation History

Addon Operator session given by jrjohnson & justinsb at Kubecon NA - Dec 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link is here for folks that haven't seen it already:
https://www.youtube.com/watch?v=LPejvfBR5_w

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add hyper link to this string? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do - great call, thank you!

we would hope these would migrate to the various addon components, so we could
also just store these under e.g. cluster-api.

Unclear whether this should be a subproject?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it depends if code will be owned in a new repo and by which SIG.
if all the work can happen in k/k and kubebuilder then a subproject is not needed?

but i think this effort needs a WG and a meeting schedule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a WG or subproject? Maybe up for discussion tomorrow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this discussed yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we have discussed - as these operators are code, ideally that code will be owned by the SIGs that own the thing being installed - the X-addon-operator should be developed alongside X. It's not clear if we end up with anything at the cluster-lifecycle scope long-term (particularly if we punt as much as possible to kubebuilder). As the intention is not to be stuck holding the code bag, maybe a working-group makes more sense?

But on the flip-side we likely want a repo to start developing the operators, until we can hand off ownership.

Can a working-group get a repo temporarily?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably nix this statement now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - nixed!

least-privilege. But it is not clear how to side-step this, nor that any of
the alternatives would be better or more secure.
* Investigate use of patching mechanisms (as seen in `kubectl patch` and
`kustomize`) to support advanced tailoring of addons. The goal here is to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patching might make upgrades tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying structure changes dramatically the patches won't apply. Added a comment about this, and that we'll need some form of error state, and that a good addon won't change structure just for fun!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to patching the addons that the operator is managing (the operand), like patching a Deployment?

In this context, patching opens the door to changes that could make the safety of upgrades unpredictable for the operator. Can they patch env vars, attached volumes, or command line arguments for the binary being run in the container? Will you limit what they can / can't patch?

The CRD is the safe API contract. Once the admin strays outside of that and starts patching the generated resources, the operator can only offer a best effort attempt to upgrade. What if a command line flag on the binary needs to be deprecated or changed? This isn't out of the question and has happened before with kube binaries. If the operator is in control and this feature is defined at the API level, then rollout of this flag change can be done smoothly by the operator. If an admin instead has patched the Deployment to use flags that the operator isn't managing, when the new version tries to roll out and the patch is reapplied with an invalid flag, the new pods are going to crashloop. This wouldn't have been caught at the patching step, the patch itself was not invalid, it wouldn't be caught until the middle of the upgrade when things start to fail.

authors:
- "@justinsb"
owning-sig: sig-cluster-lifecycle
reviewers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some reviewers / approvers before we commit this. Maybe something to discuss tomorrow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has not been discussed in the SIG already it should be. The SIG should look at who the approvers are (the chairs?)

The related sigs field should also be filled in. I would suggest SIG Architecture as one of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This was raised to SIG Cluster Lifecycle in today's meeting]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone post a summary of where the SIG is at on this? There's no harm in merging and iterating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can chat about it during our next call.
/cc @justinsb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @mattfarina - I'd like to bring it to sig-architecture when we have something concrete to show them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIG Arch is working to decentralize decision-making:
https://docs.google.com/document/d/1BlmHq5uPyBUDlppYqAAzslVbAO8hilgjqZUTaNXUhKM/edit#bookmark=id.7lj9uyhho0mu

I suggest discussing in SIG Cluster Lifecycle for now, and invite interested parties to attend.

we would hope these would migrate to the various addon components, so we could
also just store these under e.g. cluster-api.

Unclear whether this should be a subproject?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a WG or subproject? Maybe up for discussion tomorrow?

Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writeup here appears to be more like a scope or work rather than a targeted enhancement. For example, the goals state:

Explore the use of operators for managing addons

Could this be phrased more around the enhancement direction rather than the scope of the group?

authors:
- "@justinsb"
owning-sig: sig-cluster-lifecycle
reviewers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has not been discussed in the SIG already it should be. The SIG should look at who the approvers are (the chairs?)

The related sigs field should also be filled in. I would suggest SIG Architecture as one of them.

coordination each piece of tooling must reinvent the wheel; we expect more
mistakes (even measured per cluster) in that scenario.

## Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These graduation criteria are a bit squishy. If you're going to have them could you look at the in work template updates for some more guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on the sig meeting to help refine this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I realized that graduation critieria are the alpha->beta->GA criteria. So I added some alpha criteria that are adoption based, and renamed the fuzzy graduation criteria to be "success criteria"

@@ -0,0 +1,181 @@
KEP: Addons via Operators

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All content has to follow the YAML front matter. See https://jekyllrb.com/docs/front-matter/

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need to ruminate on this, because it seems a bit heavy-weight.

On air-gapped, regulated environments they'd want tight control and explicit control.

/cc @ncdc - thoughts?

reviewers:
- TBD
approvers:
- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign me as both reviewer and approver.
also
/assign @luxas @roberthbailey @timothysc


## Summary

We propose to use operators for managing cluster addons. Each addon will have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I struggle with is: do we need a whole CRD for every addon. Or is kustomize + default bundle of YAML sufficient? I'm guessing this would depend on the addon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRDs are supposed to be cheap, so I'd say we should have one for every addon - we do expect there will be per-addon-settings, and so a CRD-per-addon lets us actually use all this machinery we've invested so much in.

The hope is that we can make developing the controllers as easy as kustomize + default bundle of YAML though!

I'll add a comment to this effect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we'll have e.g. /apis/addons.k8s.io/v1/kubeproxies and /apis/addons.k8s.io/v1/corednses and so on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (modulo naming and pluralization rules)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it seem a bit weird to do that for singleton addons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really to my mind - can you clarify why it's weird?

I do think we should establish patterns for singletons (e.g. "name the instance default"). But I love the progress made on CRDs and think we should make use of them!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have followed a convention similar to what @justinsb described. the operator basically only respects a single instance of a named cluster scoped CR for certain types of operators (not all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A naming pattern like default works for me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something here, but having a different CRD type for each oddon makes more difficult to create tools for generic operations, like listing all addons with a given status, or watch for the installation of addos? What approaches should be used in that cases?


* Extend kubebuilder & controller-runtime to make it easy to build operators for
addons
* Build addons for the primary addons currently in the cluster/ directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get more concrete here and enumerate the exact ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put as a proposed list CoreDNS & kubeproxy (needed for conformance), dashboard (demos well) and metrics-server (non-trivial), LocalDNS-Agent (useful). I'm not sure whether you would rather we were more or less ambitious here @timothysc - happy to go eithher way.


We expect the following functionality to be common to all operators for addons:

* A CRD per addon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to see example(s) for common core ones, e.g. proxy.

Copy link
Member

@errordeveloper errordeveloper Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some concrete examples would help a lot. I think there are a few classes of add-ons and each of the classes has certain characteristics that are are shared within the class.

Here is what I've been thinking about the classification:

One type of addons are the components of Kubernetes that are strongly coupled with the Kubernetes releases, such as kube-dns/coredns, kube-proxy. Beside version coupling, you normally wouldn't want to run multiples of these running at the same time.

A networking component is usually less strongly coupled to Kubernetes version, but it is more critical to cluster functionality. There is also not standard way of running multiple different networks.

Secondly, there are kinds of add-ons that provide functionality that is rather critical to a user who runs workloads on the given cluster, ingress controllers and storage drivers would fall into this category. Some of these things can be seen as operators.

There is also the class of add-ons that extends functionality even further, and are often a combination of components which may or may not be defined as standalone add-ons from one of the above categories also. Istio and knative would fall into this category.

Additionally, there is a class of workloads that depends on versions of Kubernetes, but doesn't provide additional functionality to the cluster directly, observability products, dashboards, deployment tools and things like security scanners will fall into this category.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example.

@errordeveloper you raise some good categorizations - some of these are going to be "apps". For example the nginx-ingress-controller isn't really tied to the k8s cluster lifecycle. We would expect those to be deployed using tools like helm or kustomize or kubectl.

I do think we should figure out what to do about singleton objects. I've been called them "default" in "kube-system", but there are many approaches and even I only passionately hate about half of them ;-)

I don't really know how to handle "pick one of a set" type operators (i.e. CNI). Operators do let us have better cleanup procedures, but realistically it's highly disruptive to switch. kops started labelling the addons so that we could prune the other ones, but most CNI providers leave some networking plumbing behind (e.g. iptables). We can't easily do a rolling-update in most cases because the network partitions between old-CNI and new-CNI. There are usually supporting infrastructure things e.g. opening firewall rules for IPIP for calico. My preference would be that we can set up the operators to do all they can, but realistically if we get a nice migration path for any CNI providers it'll be because the tooling put in special code to move between a particular transition. We at least have somewhere to put that code now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example the nginx-ingress-controller isn't really tied to the k8s cluster lifecycle.

AWS ALB ingress controller would only ever work in AWS and needs IAM permissions, so it's tied to a few properties. When it comes to lifecycle and Cluster API, provisioning of such IAM permissions becomes a lifecycle concern.

GKE has built-in ingress controller, and that's currently something GKE user selects as an add-on and it is managed by the add-on manager. This one is actually tied into lifecycle as it gets included during cluster creation, and cannot be removed easily as far as I know.

Also, Heptio Contour ships a CRD, and that has versions dependencies.

I agree that neither of these are tied into cluster lifecycle, but they have dependencies on cluster APIs, and provide vita functionality for many users.

We would expect those to be deployed using tools like helm or kustomize or kubectl.

Even if we are to say that some things like ingress controllers are seen as "apps with deep Kubernetes integration" and in some case dependencies on cloud resources associated with the cluster, how do we expect users to make this distinctions and find the right way of managing such things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network partitions between old-CNI and new-CNI

Not the case with Weave Net, and I don't know when it is practically the case. I would expect any well-designed CNI addon to behave well.
If it is the case, I'd say it's addon-specific, and certainly justifies the need for an addon-specific operator.

There are usually supporting infrastructure things

Seems similar to IAM dependencies problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that IAM provisioning likely brings it into cluster-scope; I don't think an ingress controller or a something like kube2iam are necessarily deal-breakers there.

I would like to avoid "deploying a CRD => cluster-lifecycle". I know there are versioning concerns, but I would hope we could come up with some reasonable patterns such that in practice helm or something could manage an app that includes CRDs. i.e. although technically there are problems with incompatible CRDs, by being careful about versions we can avoid them.

I think for users, we would expect their installation tooling (kubeadm, cluster-api etc) to include or reference a small set of cluster-addons. and for users to follow that lead. And I expect we'll naturally expect kubernetes component developers to identify whether they are really a cluster-addon or an app, and there will be natural pushback against inclusion from the installation tooling if they choose cluster-addon when they are really an app.

If the CNI providers are in fact capable of being switched, that's much easier! We can probably start with simple docs for how to turn on Weave and turn off Calico, and I'm sure Weave will write those docs, just as I'm sure Calico will write the doc in the other direction!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who reviews and approves the CRD APIs for these operators? I.e. if kube owns "official DNS installation API", do the core Kubernetes API approvers have veto power over what features DNS can support? How do we determine what features are allowed to be added to the "official DNS installation API"?

We have a lot of experience with the shortcomings of domain-specific APIs like ingress, service load balancer, storage where multiple different implementations can have wildly different feature sets. What happens when the N+1 feature addition to DNS API gets NACKed by the maintainer of the "official DNS installation API"? Does someone create "unofficial DNS installation API"? Do we support random extensions? Do we pick a very hardline and encourage people to fork and write their own operators?

I like the idea of a simple DNS config API. I don't like the idea of the DNS config API becoming just another ingress API problem. I don't see how these APIs won't fall into that trap.

@bgrant0607

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternate approach would be:

Someone goes and writes a damn good coredns operator that accepts lots of features. It's not part of the core project, but we make it really easy to install that operator. No API review in core is needed. Scope of Kube doesn't increase.

Since operators are controllers the best operators will be a simple deployment, rbac rules, and maybe a secret. Do we need to go beyond that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to distinguish the types of addons?

Several of the examples, such as kube-proxy and metrics server (https://github.com/kubernetes-incubator/metrics-server), are components developed as part of the Kubernetes project. Is there a reason why it wouldn't make sense for the owners of those components to also own the operator CRDs for those components?

I'd assume SIG Network would own the DNS-related operators.

On the simple DNS API: Offhand, I'd split the common parameters into the Ingress-like implementation-independent API (or just additional fields on some object[s]), and leave implementation-specific details in the operator CRDs. I haven't seriously looked at what such a split would look like, but candidate fields include those used to generate the default /etc/resolve.conf for containers and those documented as part of kube-dns and CoreDNS configuration:
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think we have an "official DNS installation API" today?

* Operators are declaratively driven, and can source manifests via https
(including mirrors), or from data stored in the cluster itself
(e.g. configmaps or cluster-bundle CRD, useful for airgapped)
* Operators are able to expose different update behaviours: automatic immediate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh automatic updates...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I would recommend it for prod clusters, but totally automated updates of e.g. CoreDNS would be a cool demo!

coordination each piece of tooling must reinvent the wheel; we expect more
mistakes (even measured per cluster) in that scenario.

## Graduation Criteria
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on the sig meeting to help refine this.

@k8s-ci-robot
Copy link
Contributor

@timothysc: GitHub didn't allow me to request PR reviews from the following users: justinsb.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

We can chat about it during our next call.
/cc @justinsb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold
Minor comments then fine to merge as provisional
/cc @luxas

We propose to use operators for managing cluster addons. Each addon will have
its own CRD, and users will be able to perform limited tailoring of the addon
(install/don’t install, choose version, primary feature selection) by modifying
the CR. The operator encodes any special logic (e.g. dependencies) needed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CRD.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to "the instance of the CRD"

* Create patterns, libraries & tooling so that addons are of high quality,
consistent in their API surface (common fields on CRDs, use of Application
CRD, consistent labeling of created resources), yet are easy to build.
* Build addons for the basic set of components, acting as a quality reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does bundles fit under these goals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ideally each addon operator references a set of manifests somewhere, rather than baking them into the image, so we don't always have to update the operator (just when the underlying change is non-trivial).

Those manifests could be in any format really. They could be simple yaml files. I do think we should recommend a format.

If we actually pick bundles as our format, this gives us a bit of metadata when pulling from https, but because bundles are CRDs, this also gives us the ability to fetch those manifests from the cluster rather than relying on some https server, i.e. it's "half" of airgap support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean that the manifests themselves are the artifacts and the image should be embedded in them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please not files on a disk that have to be rsync'ed.

Copy link
Member Author

@justinsb justinsb Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow @smarterclayton - manifests do specify an image name, but I'm missing where you're going with that...

@bgrant0607 I don't think we would recommend files on a disk for any production configuration :-)

name: default
namespace: kube-system
spec:
clusterCidr: 100.64.0.0/10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm guessing use of bundling allows you to customize this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; other approaches include the operator discovering it by looking for a cluster-api Cluster object, or by having cluster-api create this KubeProxy instance with clusterCidr set. The latter feels more straightforward, and I do like that it is explicit vs magic defaulting, but this might actually vary for different use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP is proposing a subproject, it would be good to understand what the dependency orderings are for the project.

Is this orthogonal to the Cluster API?
Is it additive to any conformant Kubernetes cluster?
Can we get a clear statement on intended layering across the various subprojects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP implies it’s open for adoption by a plurality of tooling, but the comments make that less clear. Ideally the operator can discover it’s required context information independent of installation tooling choice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr the above is really just my speculatively enumerating some of the approaches an operator could take to get a value (clusterCidr). We do a good job of covering all the tooling in sig-cluster-lifecycle (mostly due to the eternal vigilance of @timothysc ), and so if you're able to attend the meeting then we can address your questions there I think - I'm not entirely sure what you mean by some of them. Even better, it would be great to have your participation in building the right thing!

Copy link
Member

@derekwaynecarr derekwaynecarr Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb thanks for clarifying. I asked the question if this was decomposed as the cluster api subproject meeting yesterday was also discussing composition of its APIs moving forward. I think it’s right to handle this as a separate subproject and look forward to collaborating.

This particular manifest is pinned to `version: 1.14.4`. We could also
subscribe to a stream of updates with a field like `channel: stable`.

### Risks and Mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test and verification for a release is also super important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - added a paragraph specifically for that. I think if we get to parity with the "embedded manifest" approach that we know and love, that'll help initially. And I think we unlock approaches that are a little more flexible than embedding manifests, but I don't think we need to dictate anything there.

we would hope these would migrate to the various addon components, so we could
also just store these under e.g. cluster-api.

Unclear whether this should be a subproject?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably nix this statement now.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 12, 2019
owning-sig: sig-cluster-lifecycle
reviewers:
- @luxas
- @roberthbailey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is @roberthbailey still going to be a reviewer for this considering his role change?

@derekwaynecarr
Copy link
Member

cc @jwforres interested in your perspective here as well.

applications are generally recognized. We aim to bring the benefits of
operators to addons also.

### Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been working on operator lifecycle manager for a couple of years, and it shares a lot of these same goals:

  • Define extensions to Kubernetes as operators managing CRDs (or extension apiservers)
  • Provide a consistent interface for describing operators and their interactions with the cluster and each other
  • Constrain the scope of where an operator is allowed to operate (e.g. cluster-scoped vs. namespace-scoped, something in between)
  • Prevent end-user privilege escalation via operators
  • Express dependencies between operators and their provided APIs
  • Provide a packaging and installation model that supports offline, air-gapped installs
  • Provide optional, automated updates across operators

There might be some places where OLM isn't perfectly aligned with what is needed for managing addons - but it seems pretty close, and we'd love to jumpstart this work by contributing what we've already done and iterating on it to meet the community's needs.

As a first step, we could try to prototype installing today's addons with OLM and see if there are any gaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course - we'd welcome your collaboration!

We expect the following functionality to be common to all operators for addons:

* A CRD per addon
* Common fields in spec that define the version and/or channel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a channel? Maintained and supported by the community?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A channel is what we're calling a stream of versions. Better names welcome, but I think channel is a good name that wise people wearing red hats use...

I think it would be up to a project e.g. CoreDNS whether to support a channel for their operator and how they would define it. But again, suggestions & recommendations are welcome.

* Build addons for the basic set of components, acting as a quality reference
implementation suitable for production use. We aim also to demonstrate the
utility and explore any challenges, and to verify that the tooling does make
addon-development easy.
Copy link
Contributor

@smarterclayton smarterclayton Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add another goal here:

API review, feature addition, and scope of the project should not drastically increases as a result of this KEP.

Or said another way:

We want the minimal solution which doesn't grow the scope of Kube BUT allows Kube like things to be trivial to add.

The possible APIs for Kube-like components is unbounded and cannot be scoped, so the more API surface we have here the greater the burden on the project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt cause we discussed this recently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are CRDs though - they aren't all going to be k8s APIs. We would prefer homogeneity, so we would welcome any input from you and @liggitt on what a "recommended" API should look like. We have a prototype where we defined a few common fields, which feels right - better to have version always at spec.version (I think we can agree). But whether it should be spec.version or spec.version.lock or something else entirely we could definitely use help on!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points.

I think they also apply elsewhere, such as the component config effort, exported metrics, Event contents, component log messages, and kubectl commands. We're accruing operational surface area over time.

I think the only way that can be addressed is by enabling the technical oversight of these areas to be distributed, such as by mentoring/shadowing, documenting policies and conventions, writing linters and compatibility tests, etc.

@bgrant0607
Copy link
Member

Some high-level background. (I haven't read the KEP yet.)

Add-on management is a long-standing issue:

It has been exacerbated by 2 developments:

  1. More cluster-lifecycle tools under the project. kube-up, kops, minikube, and kubeadm need and have add-on managers. (Well, we want to delete kube-up.) Whatever we use for end-to-end tests needs a mechanism also.
  2. An increasing amount of functionality expected by default not being built-in to the main components (apiserver, controller manager, scheduler, kubelet, kube-proxy), both due to adding new functionality via extensions and extracting existing functionality using new extension mechanisms (e.g., cloud providers, volume sources).

If we want Kubernetes releases to continue to be usable OOB, we at least need a reference implementation to run DNS, for instance.

I think the Operator approach is interesting because we do want Operators to run reliably and to be reconfigurable.

We do need to be careful about how/where the APIs are used. I'd expect managed services to not necessarily expose add-on management, for instance. And some essential functionality, notably DNS, has multiple implementations in use, so we may want to think about implementation-independent APIs for configuring commonly customized user-visible behaviors.

cc @thockin, since he's been interested in the distro vs. kernel question

@bgrant0607
Copy link
Member

tooling
* Useful: users are generally satisfied with the functionality of addon
operators and are not trying to work around them, or making lots of proposals /
PRs to extend them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great success criteria and the one I'm skeptical about in the current scope.

Would we do an ingress controller operator? Which ingress controllers would it support? What options on those ingress controllers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these all have operators, I was imagining there would be an nginx-ingress operator, and a distinct gclb-ingress operator, aws-alb operator etc. Ideally component projects end up owning their operators.

I was not imagining we would try here to define a generic ingress operator that can install arbitrary ingress controllers - it doesn't feel very compatible with the idea of federating responsibility & of self-determination. It also feels like that would be a sig-network project.


```yaml
apiVersion: addons.sigs.k8s.io/v1alpha1
kind: KubeProxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm a network plugin, am I required to respect this if installed (even though calico / others might provide kube-proxy function natively)? How would a third party network plugin indicate they are ignoring this operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the operator should be required to, no. These are declarations of user intent, and so if the user intent is unsatisfiable I believe the common pattern is to expose it under status if we know it won't work, or - often - simply not work correctly. kube-router might choose to check for kube-proxy and surface a status, for a better UX, but I don't think we would require that.

namespace: kube-system
spec:
clusterCidr: 100.64.0.0/10
version: 1.14.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who decides what versions can be used here? Can this drift from Kube versions, or is it required to be aligned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something like a Calico network plugin able to dictate that they only work with certain versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the addon itself will define its own numbering scheme. For the particular example of kube-proxy, today it is locked to k8s versions, but that isn't always going to be the case. CoreDNS & calico define their own versioning scheme - kube-proxy is more of a special case here.

I don't think we want to dictate that a calico operator define that it only work with some e.g. k8s versions. I suspect we'll come up with some recommendations for what an operator should do if it is running outside of the tested versions (e.g. surface a warning in status?) or running with a known-incompatible version (e.g. refuse to update & surface an error condition in status?).

What do you think we should do @smarterclayton ? What did OpenShift end up doing here?

clusterCidr: 100.64.0.0/10
version: 1.14.4
status:
healthy: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things that status would have to represent that aren't covered here:

  1. dimensions of health - conditions, message, structured information for a consumer like Deployment
  2. whether the current version has been rolled out completely and if not, how long before it is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - we plan on developing these together as a community project. It would be great to have your help / input from people that worked on operators in OpenShift!

* Extend kubebuilder & controller-runtime to make it easy to build operators for
addons
* Build addons for the primary addons currently in the cluster/ directory, at
least including those required to bring up a conformant cluster. Proposed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need principles regarding which add-ons belong as part of the Kubernetes project. In the past, the project accepted many contributions, which then later became maintenance burdens. I'd suggest we only support add-ons for components developed as part of the project, and that the owners of those components should also own the reference add-ons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the suggestion. Ideally the "cluster-addons" subproject ends up not "owning" any operators, but rather each addon operator is maintained by the team that writes the addon (on the theory that they know best how to operate their addon).

If we keep the operators sufficiently declarative, we can also make the use of the operator a tooling decision also.

affect most/all kubernetes clusters (even if we don’t mandate adoption, if we
succeed we hope for widespread adoption). We can continue with our strategies
that we use for core components such as kube-apiserver: primarily we must keep
the notion of stable vs less-stable releases, to stagger the risk of a bad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have the notion of stable vs less stable releases today. Individual components, like Core DNS, have become the default once they are considered GA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I'm saying that we would expect that addon X might continue to support 1.1 for a while after introducing 1.2 to avoid the monoculture risk. They might choose to leverage channels for this, for example choosing to define a "stable" channel that stays on 1.1 for a while, while the "beta" channel moves to 1.2 until they feel it has been derisked.

Today this work is all on the kubernetes test & release teams, which feels very centralized.

these patterns.

We hope that components will choose to maintain their own operators, encoding
their knowledge of how best to operate their addon.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the scope to also have all these managed k8s providers like gke/eks also move to this addon model ? Its currently a pain to not be able to modify easily the addons provided by default on the various providers . But at the same time, will CRD provide enough configurability or just need a way to expose all these addons to be modifiable by consumers. Every one has their unique requirements on what part of each addon they want to modify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a requirement for anyone to move to the model, but we would like to create something that tooling chooses to move to. Figuring out ways to support a balance of modification and control to enable that is in scope.

necessarily be directly applicable in the OSS world and we are open to change as
we discover new requirements.

* Extend kubebuilder & controller-runtime to make it easy to build operators for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the project goals for the operator SDK, bootstrapping new operator projects so that they will be set up to follow best practices. The SDK is based on the controller-runtime project.

least-privilege. But it is not clear how to side-step this, nor that any of
the alternatives would be better or more secure.
* Investigate use of patching mechanisms (as seen in `kubectl patch` and
`kustomize`) to support advanced tailoring of addons. The goal here is to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to patching the addons that the operator is managing (the operand), like patching a Deployment?

In this context, patching opens the door to changes that could make the safety of upgrades unpredictable for the operator. Can they patch env vars, attached volumes, or command line arguments for the binary being run in the container? Will you limit what they can / can't patch?

The CRD is the safe API contract. Once the admin strays outside of that and starts patching the generated resources, the operator can only offer a best effort attempt to upgrade. What if a command line flag on the binary needs to be deprecated or changed? This isn't out of the question and has happened before with kube binaries. If the operator is in control and this feature is defined at the API level, then rollout of this flag change can be done smoothly by the operator. If an admin instead has patched the Deployment to use flags that the operator isn't managing, when the new version tries to roll out and the patch is reapplied with an invalid flag, the new pods are going to crashloop. This wouldn't have been caught at the patching step, the patch itself was not invalid, it wouldn't be caught until the middle of the upgrade when things start to fail.

### Risks and Mitigations

This will involve running a large number of new controllers. This will require
more resources; we can mitigate this by combining them into a single binary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect operator owners will want to lifecycle and release new versions of their operators independently from each other. Especially for operators that aren't tied to a particular kubernetes release.

Combining into a single operator binary for all addons also makes your RBAC concerns worse, now there is one operator with access to do everything all of the operands can do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair point. I'm also unhappy about the way RBAC permissions "trickle down", where an operator needs to have all the permissions of the addon - I hope we can investigate ways to address that. Is there anything we can learn from from Openshift's work here?

Initial development of the tooling can probably take place as part of
kubebuilder

We should likely create a repo for holding the operators themselves. Eventually

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @detiber A single repo for all operators doesn't allow the owners of the addons to build their operators where and how they want. If we know of addon operators that don't need to cycle exactly with a particular kubernetes release, a single repo doesn't seem like the right approach to enabling that scenario.

Define the API contract an operator must fulfill (status/condition reporting, how the operator says if it has rolled out the expected version, etc). Have tooling to make it easy for an operator to bootstrap a project to conform to that API contract.

@justinsb
Copy link
Member Author

@jwforres for some reason I can't reply to some of your comments inline.

For allowing patching, yes it's clear this is the "break glass" option and carries an element of "use at your own risk". The reason for it is that we don't want CRDs that redefine every field in the manifest, lest we end up defining a meta-language in the CRD. The hope is that supporting something like patching allows the CRD to be smaller and still cover the vast majority of users, with the remaining users able to use patching and not be blocked. But also it's much more realistic to support a relatively small surface area API. A large surface area API over an addon with breaking changes will also have breaking changes.

For flags specifically, you're absolutely right that they are brittle - and there's complementary work to establish componentconfig as the recommended approach instead (the component work being done by @luxas @mtaufen @sttts et al)

For the one repo vs multiple repos question, if there are projects that want to adopt the patterns right away, that's great. I was suspecting that we would have to build a few in a single repo, before persuading the projects to move the code into their repo/repos. But I could be wrong, and I welcome the optimism :-)

By creating a CRD per addon, we are able make use of the kubernetes API
machinery for those per-addon options.

We will create tooling to make it easy to build addon operators that follow the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just upstream this work into controller runtime so that both kubebuilder and OperatorKit can leverage it? Or do you have more specific work in mind i.e something like AddOn(builder/kit)?

Addons are components that are managed alongside the lifecycle of the cluster.
They are often tied to or dependent on the configuration of other cluster
components. Management of these components has proved complicated. Our
existing solution in the form of the bash addon-manager has many known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I like the idea of having add on manager replace the bash scripts. I think it generally improves our cluster turn up story for users of OSS kubernetes that do not have the benefit of leveraging a cloud provider service for their cluster management.

addons
* Build addons for the primary addons currently in the cluster/ directory, at
least including those required to bring up a conformant cluster. Proposed
list: CoreDNS, kube-proxy, dashboard, metrics-server, localdns-agent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see why we couldn't incorporate those (with some SWE effort) as part of the swtich over from bash to CRDs.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2019
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

As discussed and approved in today's SIG Cluster Lifecycle meeting, we're merging this KEP as provisional now, and will follow up in the new cluster-addons subproject of SIG Cluster Lifecycle that was formed to discuss and own the direction of this KEP.

The repo has been set up: https://github.com/kubernetes-sigs/addon-operators
The upcoming meetings have been announced: https://groups.google.com/d/msg/kubernetes-sig-cluster-lifecycle/GA2q8BYXkTo/lg9NWWyAAwAJ
If you want to discuss the implementation details that are still outstanding from this KEP as-is, join us for the meetings starting: Friday, 5th April 10:00 PDT
Meeting notes: https://docs.google.com/document/d/10_tl_SXcFGb-2109QpcFVrdrfnVEuQ05MBrXtasB0vk/edit#
Slack channel: #cluster-addons

Thanks @dholbach and @justinsb for doing the logistics to get this effort started!
See you on next Friday!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, luxas, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@luxas
Copy link
Member

luxas commented Mar 26, 2019

/hold cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet