Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/proposals: adding helm operator proposal #658

Merged
merged 5 commits into from
Nov 5, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions doc/proposals/helm-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
## Helm Operator Proposal

### Background

As was mentioned in the [Ansible Operator Proposal](./ansible-operator.md), not everyone is a golang developer, so the SDK needs to support other types of operators to gain adoption across a wider community of users.

[Helm](https://helm.sh/) is one of the most widely-used tools for Kubernetes application management, and it bills itself as the "package manager for Kubernetes." Operators serve a nearly identical function, but they improve on Helm's concepts by incorporating an always-on reconciliation loop rather than relying on an imperative user-driven command line tool. By integrating Helm's templating engine and release management into an operator, the SDK will further increase the number of potential users by adding the ability to deploy Charts (e.g. from Helm's [large catalog of existing Charts](https://github.com/helm/charts)) as operators with very little extra effort.

### Goals

The goal of the Helm Operator will be to create a fully functional framework for Helm Chart developers to create operators. It will also expose a library for golang users to use Helm in their operator if they so choose. These two goals in conjunction will allow users to select the best technology for their project or skillset.

### New Operator Type

This proposal creates a new type of operator called `helm`. The new type is used to tell the tooling to act on that type of operator.

### Package Structure

Packages will be added to the operator-sdk. These packages are designed to be usable by the end user if they choose to and should have a well documented public API. The proposed packages are:

* /operator-sdk/pkg/helm/client
* Will contain a helper function to create a Helm client from `controller-runtime` manager.

* /operator-sdk/pkg/helm/controller
* Will contain an exported `HelmOperatorReconciler` that implements the `controller-runtime` `reconcile.Reconciler` interface.
* Will contain an exported `Add` function that creates a controller using the `HelmOperatorReconciler` and adds watches based on a set of watch options passed to the `Add` function.

* /operator-sdk/pkg/helm/engine
* Will contain a Helm Engine implementation that adds owner references to generated Kubernetes resource assets, which is necessary for garbage collection of Helm chart resources.

* /operator-sdk/pkg/helm/internal
* Will contain types and utilities used by other Helm packages in the SDK.

* /operator-sdk/pkg/helm/release
* Will contain the `Manager` types and interfaces. A `Manager` is responsible for:
* Implementing Helm's Tiller functions that are necessary to install, update, and uninstall releases.
* Reconciling an existing release's resources.
* A default `Manager` implementation is provided in this package but is not exported.
* Package functions:
* `NewManager` - function that returns a new Manager for a provided helm chart.
* `NewManagersFromEnv` - function that returns a map of GVK to Manager types based on environment variables.
* `NewManagersFromFile` - function that returns a map of GVK to Manager types based on a provided config file.

### Commands

We are adding and updating existing commands to accommodate the Helm operator. Changes to the `cmd` package as well as changes to the generator are needed.

#### New

New functionality will be updates to allow Helm operator developers to create a new boilerplate operator structure with everything necessary to get started developing and deploying a Helm operator with the SDK.

```
operator-sdk new <project-name> --type=helm --kind=<kind> --api-version=<group/version>
```

Flags:
* `--type=helm` is required to create Helm operator project.
* **Required:** --kind - the kind for the CRD.
* **Required:** --api-version - the group/version for the CRD.

This will be new scaffolding for the above command under the hood. We will:
* Create a `./<project-name>` directory.
* Create a `./<project-name>/helm-charts` directory.
* Generate a simple default chart at `./<project-name>/helm-charts/<kind>`.
* Create a new watches file at `./<project-name>/watches.yaml`. The chart and GVK will be defaulted based on input to the `new` command.
* Create a `./<project-name>/deploy` with the Kubernetes resource files necessary to run the operator.
* Create a `./build/Dockerfile` that uses the watches file and the helm chart. It will use the Helm operator as its base image.

The resulting structure will be:

```
<project-name>
| watches.yaml
|
|-- build
| | Dockerfile
|
|-- helm-charts
| |-- <kind>
| | Chart.yaml
| | ...
|
|-- deploy
| | operator.yaml
| | role_binding.yaml
| | role.yaml
| | service_account.yaml
| |
| |-- crds
| | <gvk>_crd.yaml
| | <gvk>_cr.yaml
```

The SDK CLI will use the presence of the `helm-charts` directory to detect a `helm` type project.

#### Add

Add functionality will be updated to allow Helm operator developers to add new CRDs/CRs and to update the watches.yaml file for additional Helm charts. The command helps when a user wants to watch more than one CRD for their operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is only for creating CRDs? I think you may just want to re-use the ansible operator stuff for just ‘add crd’.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I didn't see the add crd subcommand the first time around.

My initial thought was that app api would be operator type aware.

  • For go, it would do what it does now.
  • For ansible, it would create the crds, update the watches file, and scaffold out a new roles/<Kind> Ansible role.
  • For helm it would create the crds, update the watches file, and scaffold out a new helm-charts/<Kind> Helm chart.

If that was the pattern, that could also potentially align how operator-sdk new works (i.e. --api-version and --kind would no longer be valid for a new operator of any type)

I was surprised when I ran operator-sdk add api in an ansible operator project and it spit out Go files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone wants to convert to a golang operator from an ansible operator, don't want to lock them in.

I also don't like calling this adding an API, IMO an API is what this does adds the go structures for the API. I would prefer to keep the CRD command as I think it is much more descriptive. When discussing the ansible operator, we never talk about adding API's or watching API's we discuss CRDs and CRs.

I also do not want new for ansible operator does not allow --api-version and --kind. This is really important to the ansible operator workflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the command add api is only valid in the context of a Go operator.
We can use the command add crd here.

I was surprised when I ran operator-sdk add api in an ansible operator project and it spit out Go files.

We need to make sure to check not run the commands specific to the Go type only run in a Go operator project:

  • add api
  • add controller
  • generate k8s

I'll create an issue to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley Sounds good. Just wanted to make sure that was a conscious choice and not leftover from the pre-0.1.0 CLI.

So I'll update the proposal to specify that add crd will be used for the helm operator and that other add commands will be invalid.

What about the question of when to scaffold a new boilerplate role or chart? Only on new or also on add crd? Any opinions there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we scaffold a new role for add crd although it kind of makes sense that you would want to associate a new role/chart with a new type.
But then do we update watches.yaml file as well to point to the new chart/role?
And if we do couple it with the add crd command then we should ensure that it handles Go/Ansible/Helm project types differently.

The only downside is that this couples a bunch of different things to the add crd command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hasbro17 Is there a use case for add crd that wouldn't involve other changes? Along those same lines, for a Go project, when would one use add crd and not add api?

If there is a use case for running add crd standalone, I can see how coupling it to these other things may not be desirable.

But to me, it seems like the user is always going to follow add crd up with other manual changes that will likely depend on the type of the operator. For ansible and helm, I think it's pretty likely users will be manually adding a new role/chart and updating watches.yaml.

If users are doing that manually, it increases the likelihood that they forget to change something somewhere or make a typo that causes their new CRD to fail in some way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford add crd was initially added for the Ansible project.
I don't think there was a use case for it in a Go project, since you almost always need the pkg/apis/... files to register the CRD type with the scheme so that the operator can actually watch or CRUD that type.

Maybe if you're dealing with unstructured types(like the ansible and helm operator do) then you don't need the api definition pkg/apis/.... But I don't think that's a common use case.

So yeah I think running add crd standalone isn't necessary for the Go project when we have add api.
So disregarding the Go project we could couple it with adding a role/chart and updating watches.yaml.
@shawn-hurley WDYT? Leave it up to the users. Couple it with add crd. Or a new subcommand.


```
operator-sdk add crd --api-version=<group>/<version> --kind=<kind> --update-watches=<true|false>
```

Flags:
* **Required:** --kind - the kind for the CRD.
* **Required:** --api-version - the group/version for the CRD.
* **Optional:** --update-watches - whether or not to update watches.yaml file (default: false).

**NOTE:** `operator-sdk add` subcommands `api` and `controller` will not be supported, since they are only valid for Go operators. Running these subcommands in a Helm operator project will result in an error.

#### Up

Up functionality will be updated to allow Helm operator developers to run their operator locally, using the `operator-sdk` binary's built-in helm operator implementation.

```
operator-sdk up local
```

This should use the known structure and the helm operator code to run the operator from this location. The existing code will need to be updated with a new operator type check for `helm` (in addition to existing `go` and `ansible` types). The command works by running the operator-sdk binary, which includes the Helm operator code, as the operator process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this needs to be solved later, and that is fine, but how are we differentiating between ansible and helm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For helm at least, the immediately obvious option would be to check for the presence of the helm-charts directory in the project root.


#### Build

Build functionality will be updated to support building a docker image from the Helm operator directory structure.

```
operator-sdk build <image-name>
```

#### Test

The SDK `test` command currently only supports Go projects, so there will be no support for the `operator-sdk test` subcommand in the initial integration of the Helm operator.

### Base Image

The SDK team will maintain a build job for the `helm-operator` base image with the following tagging methodology:
* Builds on the master branch that pass nightly CI tests will be tagged with `:master`
* Builds for tags that pass CI will be tagged with `:<tag>`. If the tag is also the greatest semantic version for the repository, the image will also be tagged with `:latest`.

The go binary included in the base image will be built with `GOOS=linux` and `GOARCH=amd64`.

The base image repository will be `quay.io/water-hole/helm-operator`.

### Observations and open questions

* There will be a large amount of overlap in the `operator-sdk` commands for the Ansible and Helm operators. We should take care to extract the resusable features of the Ansible operator commands into a shared library, usable by both Helm and Ansible commands.

* There is a moderate amount of complexity already related to how operator types are handled between the `go` and `ansible` types. With the addition of a third type, there may need to be a larger design proposal for operator types. For example, do we need to define an `Operator` interface that each of the operator types can implement for flag verification, scaffolding, project detection, etc.?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think there is complexity, I would prefer to keep the complexity as close to the surface. Something like a factory pattern which each up command(helm, go, ansible) have their own implementation and their own flags possibly?

I also would like to at least have an idea on what to do here in this proposal IMO just so we don’t live in a super complex world while we figure this out? Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley Yeah that's pretty much what I was thinking. I'm not sure what the cleanest way to do that would be though.

Maybe we could do the operator type detection / --type flag parsing before passing things off to cobra and then have each operator type implement the cobra subcommand functions directly or pass the operator implementation to the subcommand functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do the operator type detection / --type flag parsing before passing things off to cobra and then have each operator type implement the cobra subcommand functions directly or pass the operator implementation to the subcommand functions

This seems like we'll have implicit subcommands:

  • operator-sdk build --type=go ==> operator-sdk go build
  • operator-sdk build --type=ansible ==> operator-sdk ansible build

This goes back to the question of what sub commands should be allowed in what types of projects, and how explicit should we be about specifying the type. #670

Some commands are common across different types(build, up local) whereas other sub commands only make sense for certain project types e.g Go(add api, generate k8s) Ansible/Helm(add crd). It gets a little confusing if we want to allow those exclusive commands across all types to allow migrating to a hybrid Go/Ansible Go/Helm operator.

One option is we stay with the existing convention of specifying the --type flag(or inferring from the project) for each subcommand so we can keep the commands consistent for all 3 types, and handle the logic for each type within the same sub command:

  • operator-sdk up local(infer the type) or operator-sdk up local --type=<type>

This keeps all subcommands consistent, except we still have to specify the --type flag or infer that from the project. The downside is the ambiguity in what commands are allowed where.

Or we take a top down approach as @joelanford suggested (#670 (comment)) and put the operator type as the first root sub command:

  • operator-sdk go build
  • operator-sdk ansible build
  • operator-sdk helm build

The advantage of the type root subcommand is that there's no ambiguity on what commands run in what types of projects with differing arguments and flags.
The drawback is that it's a little verbose(redundant?) for commands that are (mostly)common in their behavior
e.g operator-sdk build.
Also if I end up with a hybrid operator do I run operator-sdk go build or operator-sdk ansible build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points. Maybe its a combination. I think we can categorize the CLI into three sets of commands.

  1. Global commands where operator type does not need to be inferred:

    • completion, help, and new
  2. Commands that need to support possible hybrid operators and where the operator type(s) will need to be inferred:

    • up, build, test
  3. Commands that apply to specific types:

    • Go: add api, add controller, generate
    • Ansible: add crd
    • Helm: ?

For 2., as @estroz mentioned (#670 (comment)), we can detect all of the types an operator has and be opinionated about which combinations we'll support, with helpful errors/warnings about non-supported combinations.

For 3., another option may be to keep add and generate as top-level commands, but then have type subcommands of those. So:

  • operator-sdk add go api
  • operator-sdk generate go k8s
  • operator-sdk add ansible crd (or maybe even better, operator-sdk add ansible role)
  • operator-sdk add helm crd (or add helm chart)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 3., another option may be to keep add and generate as top-level commands, but then have type subcommands of those. So:

I was thinking of something along those lines as well. Would make things clear when you do operator-sdk add go --help as it would display all the subcommands.