k8s-tiller module #9

yorinasub17 · 2019-01-30T17:42:34Z

depends on gruntwork-io/terratest#231 and ~~gruntwork-io/kubergrunt#7~~ MERGED

This implements the k8s-tiller module and example that allows you to deploy Tiller following security best practices.

Notes:

This uses kubergrunt under the hood.
This uses a pattern for module dependencies. This was necessary to force destroy order. Otherwise terraform started destroying the roles and role bindings from the tiller service account, preventing it from destroying itself.

yorinasub17 · 2019-01-31T04:47:08Z

modules/k8s-namespace/main.tf

+# SET MODULE DEPENDENCY RESOURCE
+# This works around a terraform limitation where we can not specify module dependencies natively.
+# See https://github.com/hashicorp/terraform/issues/1178 for more discussion.
+# ---------------------------------------------------------------------------------------------------------------------


@rileykarson Ok I found this pattern that was posted in December on module dependencies and it appears to be working well. I think this can be used to now chain the namespace module to the role binding.

Can you explain what this is meant to be used for? That is, what are you depending on?

The main issue with GKE is that the default permissions are not enough to create additional RBAC roles. So you have to first promote the user to a cluster admin, and then you can proceed to create these roles. So in terraform, we need to make sure we apply these rules after the cluster admin role binding is created.

Specifically, your identity needs to have a superset of the permissions of a role to create that role. See https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#defining_permissions_in_a_role

…le support

yorinasub17 · 2019-01-31T06:05:47Z

@brikis98 I'd like a review from you on this one, specifically around the module dependencies system implemented here. The current implementation works (I had issues with resources being deleted too early which are now resolved), but I am not sure if there are other gotchas that we don't know about in implementing the module dependencies.

yorinasub17 · 2019-01-31T06:08:17Z

@brikis98 Forgot to mention your review is not urgent, given that it currently works. Just wanted to hear your thoughts, eventually.

brikis98 · 2019-01-31T11:59:56Z

examples/k8s-tiller/README.md

+- [minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/)
+- [Kubernetes on Docker for Mac](https://docs.docker.com/docker-for-mac/kubernetes/)
+- EKS
+- GKE


Link to our modules?

brikis98 · 2019-01-31T12:00:39Z

examples/k8s-tiller/README.md

+1. Install [Terraform](https://www.terraform.io/), minimum version: `0.9.7`.
+1. Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
+1. Install [helm client](https://docs.helm.sh/using_helm/#install-helm)
+1. Install [kubergrunt](https://github.com/gruntwork-io/kubergrunt)


Wow, that's a lot of stuff to install. I think I've asked before, but does it make sense to embed kubectl or helm client in kubergrunt (or call the APIs they are using directly)? Or do their versions/APIs change so often that we'd never be able to keep up?

I haven't looked into this yet for helm, but for kubectl, we can't embed it because a lot of the logic is baked into the command (package main) as opposed to a library.

That said, based on what I know, I think it is safe to assume that everyone will have kubectl installed if they are doing anything related to Kubernetes. You lose A LOT when you don't have that, so that one is more or less a moot operation.

I think the helm client might be avoidable, but I will have to look into that more. I would probably file that away as a "nice to have" though. I filed gruntwork-io/kubergrunt#13 so we don't forget this thought.

examples/k8s-tiller/main.tf

brikis98 · 2019-01-31T12:02:48Z

modules/k8s-namespace/main.tf

+
+resource "null_resource" "dependency_getter" {
+  provisioner "local-exec" {
+    command = "echo ${length(var.dependencies)}"


Does echo exist on Windows?

On Powershell yes. I have verified this. This does not exist on CMD though so not the most portable, but probably good enough based on prior conversations.

brikis98 · 2019-01-31T12:03:14Z

modules/k8s-namespace/main.tf

+# SET MODULE DEPENDENCY RESOURCE
+# This works around a terraform limitation where we can not specify module dependencies natively.
+# See https://github.com/hashicorp/terraform/issues/1178 for more discussion.
+# ---------------------------------------------------------------------------------------------------------------------


Can you explain what this is meant to be used for? That is, what are you depending on?

brikis98 · 2019-01-31T12:03:48Z

modules/k8s-namespace/outputs.tf

@@ -12,3 +12,8 @@ output "rbac_access_read_only_role" {
  description = "The name of the RBAC role that grants read only permissions on the namespace."
  value       = "${kubernetes_role.rbac_role_access_read_only.metadata.0.name}"
 }
+
+output "depended_on" {
+  description = "This output can be used to depend on the resources in this module."


Example use case?

brikis98 · 2019-01-31T12:04:49Z

modules/k8s-tiller/main.tf

+
+resource "null_resource" "tiller" {
+  triggers {
+    tiller_namespace = "${var.tiller_namespace}"


Do you ever need to update tiller? I ask as this trigger will only fire on creation.

Good point. You will most likely want to upgrade Tiller versions at some point. However, you can use the helm client to upgrade:

helm init --upgrade --tiller-namespace TILLER_NAMESPACE

I will add this to the README.

brikis98 · 2019-01-31T12:05:25Z

modules/k8s-tiller/main.tf

+  provisioner "local-exec" {
+    command = "kubergrunt helm undeploy ${local.kubectl_config_options} --home ${local.helm_home_with_default} --tiller-namespace ${var.tiller_namespace} ${local.undeploy_args}"
+    when    = "destroy"
+  }


Much of this code seems to be calling out to kubergrunt. Sanity check: should this really be Terraform code or just a new command that does a bunch of steps for you built into kubergrunt?

In fact, is there anything in this module that actually uses a real Terraform resource?

Thanks for the sanity check. You are right that there is nothing in this module that uses a real Terraform resource, so yes I am forcing Terraform where the operator could just call out to kubergrunt. I am not 100% sure if this is a good idea yet (I don't think I have enough Terraform scars to make a judgment call), but I have my reasons for at least attempting it.

There is one concrete use case in my head for doing this, and a few potential use case / benefits:

Concrete:

For EKS, there are what I call "essential" helm charts. These are helm charts that almost all clusters will want to install. I don't know yet if there is an equivalent for GKE, but presumably you will want at least an ingress controller. In this use case, it is nice to be able to chain the modules in one terraform apply: provision EKS => configure kubectl => deploy Tiller => install "essential" helm charts (kube2iam / kiam and alb-controller).

Potential:

There is something nice about being able to manage access in a declarative manner. Once I implement the revoke command (deprioritized for now in favor of being able deploy services to k8s), changing the granted RBAC entity list should do the right thing and revoke users that didn't have access before and grant new ones. Note that right now this is append only because I am missing the revoke command, but eventually I intend to implement revoke.

Similarly, there is something nice about declaratively managing Tiller deployments as code. If we just reduce to a kubergrunt command, how does one find out what Tillers were deployed? Which namespaces does Tiller manage? We could use labels and such to make it easier to discover, but I don't think that beats opening your terraform config and finding the k8s-tiller module calls.

Ideally all this would be a new terraform provider, or baked into helm provider. However, there are a few reasons why I avoided that in this initial implementation:

We already are going to the direction of recommending people install kubergrunt for other reasons. So it would be one less thing to install if we could just use kubergrunt.

I am not too familiar with provider development (@rileykarson probably has more insight here), but with the limited knowledge I have, I think the advantages of a provider over this approach is being able to implement updates (e.g change name or append labels) and better drift tracking via state management. Both are nice to haves, but I am not sure if it is worth the overhead of a new provider since what's here gets us 80% of what we need.

Eventually I was going to tackle adding some of this functionality into the helm provider (Add tiller resource hashicorp/terraform-provider-helm#134), which would be the best case scenario for implementing the chaining. But I wanted to make sure we had a short term escape hatch in case something went horribly wrong there (the community rejects the idea, yak shaving on provider, etc), which is this.

Not everyone will want to use Terraform for managing helm. kubergrunt satisfies those peoples' use case. Some people will want to use Terraform. This will satisfy those peoples' use case. So there is benefit to having the kubergrunt commands, and wrapping it in Terraform.

Hm, I'm worried that having this as a Terraform module will create the expectation that create (apply), update (apply again—will work for append but probably not replace?), and delete (destroy) will all work correctly, but right now, as far as I can tell, only create works. I agree it's nice to capture, as code, what you did... but if you can't update or clean it up, I worry it will lead to confusion and errors. It's also complicated to use, as you have to install lots of dependencies besides Terraform itself, and we always have the potential for cross-OS portability issues.

Writing our own provider or doing a PR to the helm provider is a good idea, but I agree, likely to be too expensive to be worth doing now.

Would a middle ground be to turn this set of steps into a single kubergrunt command (plus docs), and add another kubergrunt command that shows you exactly what Tillers were deployed and the namespaces they manage (i.e., something like a tiller-status command)?

I worry it will lead to confusion and errors

That is probably a fair point. Thinking through a few update scenarios, I realized that you are probably right that this may lead to a lot of confusion.

It's also complicated to use, as you have to install lots of dependencies besides Terraform itself

Yes I agree about that. But what if we have:

embed helm into kubergrunt to remove the need for helm.

verify kubectl is unnecessary and if it is, update kubergrunt so it is not necessary (I think I only have this as a dependency right now because of the kubeconfig setup and management).

Implement fetch_exec module that fetches kubergrunt and runs it. This is what we talked about. This replaces kubergrunt with python, so slightly better but arguably is a wash.

These are probably good things to do regardless of the decision made.

and we always have the potential for cross-OS portability issues.

I don't quite understand the potential cross-OS portability issue that is specific to this approach. What is in here that leads to that concern?

only create works.

FWIW, destroy actually works.

Would a middle ground be to turn this set of steps into a single kubergrunt command (plus docs), and add another kubergrunt command that shows you exactly what Tillers were deployed and the namespaces they manage (i.e., something like a tiller-status command)?

I'm not sure it is necessary to have the joined command (maybe a few more options to deploy to autoconfigure the deployer's local helm client?), but tiller-status is definitely good to have regardless.

Alright you've convinced me. I think with some time we can actually make this work, but it is arguable if it is worth the effort now. Especially considering the ticking time bomb that is helmv3 where all of this will be put to waste in the Tiller-less world.

=========================================

So to summarize:

Add --local-rbac-user, --local-rbac-group, --local-rbac-service-account options to kubergrunt helm deploy that can be used to configure the local helm client with that rbac entity.

Implement a tiller-status command into kubergrunt that shows you all the installed Tillers and what RBAC entities have been granted access.

We will abandon the efforts to modularize Tiller here. The consequence is that this repo needs to be renamed (possibly terraform-kubernetes-namespace or terraform-kubernetes-namespace-factory?). @brikis98 Have we renamed a released repo before? Does github redirect old names?

@robmorgan and @rileykarson does that make sense given the discussion in this thread? What are your thoughts?

Also, the argument against storing in state makes enough sense, thanks! I guess our [the Google provider team] view has generally been that Terraform requires access to privileged data and consequently should be able to manage it. There are definitely a lot of places that certs/creds are required to use certain features.

Terraform state and secrets is not a great situation. Since secrets are stored in plain text in state files, we try to minimize what secrets we put in them when possible. Unfortunately, this often isn't possible with Terraform, and we sort of cringe, and hope that whatever protections we put in place for the state storage (e.g., S3 IAM permissions) are enough.

I don't think I'd make this the reason to design our helm integration entirely differently. That said, we need to be practical. How long would it take to get hashicorp/terraform-provider-helm#134 done? That is, to write the code, test it, get it reviewed, merged, and released? And this will likely take multiple rounds of code/test/review/merge/release, as we're unlikely to get a perfect, bug-free implementation on the first try. If this is a few days of work, then that seems like the way to go. If this is multiple weeks of work, with a non-trivial chance of being blocked for months more waiting on reviews or releases, then we probably don't want to stall the entire GCP project waiting on that now.

Right- at first glance resolving hashicorp/terraform-provider-helm#134 would probably take 2-5 eng days to complete and then a week or 2 for review passes + then waiting for a release to be cut (although that provider would possibly cut one at will? not sure)

Given the TLS/secret situation and the latency that making provider changes would have, I think that the proposed kubergrunt functionality makes sense 👍

Thanks for those sanity checks @brikis98 and @rileykarson! Always good to hear second opinions after being solo on the thoughts for a while :)

Ok to summarize:

Ideally we will use terraform resources for everything, but due to constraints on secrets management and provider resource deficiency, we will work outside of that.

The kubergrunt functionality is a "good enough" solution for now. It has the added bonus of being potentially useful to those that don't use terraform either.

The dependency logic for service accounts and namespaces is necessary for GKE, so that should be implemented.

We still have an open question about this module. Here are the Pros and Cons I gather:

Pros:

You can feed in the outputs of k8s-namespace and k8s-service-account to kubergrunt more easily than copy paste from outputs.

Implements destroy

Provides documentation as code of all the tiller deployments you have potentially made

Cons:

Is not idiomatic Terraform, and thus is unintuitive:

Depends on external tools

Does not implement plan and updates correctly

Could it be disaster inducing? E.g can you easily undeploy Tiller and all the releases accidentally with a typo?

=====

For now, I am leaning towards making this an example with a null_resource calling out to kubergrunt instead of a k8s-tiller submodule for now, so it is clearer that we don't implement all the terraform niceties, and document our intention to update when the helm provider is updated. If it turns out that it would be nice to have an actual module making the calls to kubergrunt after implementing the examples in EKS and GKE, we can revisit.

So given that, here is what I am going to do:

Close this PR and the related feature branch

Open a new PR with the dependency logic in k8s-namespace and k8s-service-account so that GKE works.

Open a new PR with an example that calls out to kubergrunt and put documentation there about the approach, alternatives we considered, and why it is that way right now.

File github issues:

Contribute to Add tiller resource hashicorp/terraform-provider-helm#134

Implement k8s-tiller module using the Tiller resource, kubernetes provider, and TLS provider, with warnings about tfstate. If users want an alternative, we can point to kubergrunt.

Let me know if there are any objections. I am going to continue with this motion, but I believe it will be easy to reverse any of these actions so we can continue even after I have started.

brikis98 · 2019-01-31T12:05:56Z

modules/k8s-tiller/main.tf

+  count = "${trimspace(local.grant_args) == "" ? 0 : 1}"
+
+  triggers {
+    grant_args = "${local.grant_args}"


If you update these args, will kubergrunt helm grant remove the old ones and grant the new ones? Or is it append-only?

It is append only for now, but I plan on implementing revoke, at which point I will change this to a loop and grant / revoke each one separately.

Worth adding as a comment or to the docs

brikis98 · 2019-01-31T12:06:29Z

modules/k8s-tiller/main.tf

+# INTERPOLATE AND CONSTRUCT COMMAND ARGUMENTS
+# ---------------------------------------------------------------------------------------------------------------------
+
+// TODO: When we have package-terraform-utilities available


I think we could open source this. It's not exactly secret sauce...

yorinasub17 · 2019-02-02T01:14:55Z

As discussed, this has been replaced with

yorinasub17 added 3 commits January 29, 2019 17:36

Initial implementation of k8s-tiller

1790e2d

Implementation of k8s-tiller module

e4e75ad

WIP: initial version of terratest

89832f3

yorinasub17 commented Jan 31, 2019

View reviewed changes

Test passes

e2a6d43

yorinasub17 changed the title ~~WIP NOT READY~~ k8s-tiller module Jan 31, 2019

yorinasub17 requested review from rileykarson and robmorgan January 31, 2019 04:58

yorinasub17 added 3 commits January 30, 2019 21:28

Fixed k8s-namespace-with-service-account example to work with rbac ro…

567b705

…le support

Use beta release of kubergrunt

3b5710b

Make sure to install helm

a506e60

yorinasub17 mentioned this pull request Jan 31, 2019

helm configure gruntwork-io/kubergrunt#7

Merged

yorinasub17 requested a review from brikis98 January 31, 2019 06:03

Use official release version of kubergrunt helm

1a44f00

brikis98 reviewed Jan 31, 2019

View reviewed changes

Update examples/k8s-tiller/main.tf

1820bec

yorinasub17 mentioned this pull request Feb 2, 2019

Tiller example #15

Merged

yorinasub17 closed this Feb 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s-tiller module #9

k8s-tiller module #9

yorinasub17 commented Jan 30, 2019 •

edited

Loading

yorinasub17 Jan 31, 2019 •

edited

Loading

brikis98 Jan 31, 2019

yorinasub17 Feb 1, 2019

rileykarson Feb 1, 2019

yorinasub17 commented Jan 31, 2019 •

edited

Loading

yorinasub17 commented Jan 31, 2019

brikis98 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

brikis98 Jan 31, 2019

brikis98 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

brikis98 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

rileykarson Jan 31, 2019

brikis98 Jan 31, 2019

rileykarson Feb 1, 2019

yorinasub17 Feb 1, 2019 •

edited

Loading

rileykarson Feb 1, 2019

brikis98 Jan 31, 2019

yorinasub17 Jan 31, 2019

brikis98 Jan 31, 2019

brikis98 Jan 31, 2019

yorinasub17 commented Feb 2, 2019

k8s-tiller module #9

k8s-tiller module #9

Conversation

yorinasub17 commented Jan 30, 2019 • edited Loading

yorinasub17 Jan 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yorinasub17 commented Jan 31, 2019 • edited Loading

yorinasub17 commented Jan 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yorinasub17 Feb 1, 2019 • edited Loading

Choose a reason for hiding this comment

Pros:

Cons:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yorinasub17 commented Feb 2, 2019

yorinasub17 commented Jan 30, 2019 •

edited

Loading

yorinasub17 Jan 31, 2019 •

edited

Loading

yorinasub17 commented Jan 31, 2019 •

edited

Loading

yorinasub17 Feb 1, 2019 •

edited

Loading