-
Notifications
You must be signed in to change notification settings - Fork 55
Conversation
# SET MODULE DEPENDENCY RESOURCE | ||
# This works around a terraform limitation where we can not specify module dependencies natively. | ||
# See https://github.com/hashicorp/terraform/issues/1178 for more discussion. | ||
# --------------------------------------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rileykarson Ok I found this pattern that was posted in December on module dependencies and it appears to be working well. I think this can be used to now chain the namespace
module to the role binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this is meant to be used for? That is, what are you depending on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue with GKE is that the default permissions are not enough to create additional RBAC roles. So you have to first promote the user to a cluster admin, and then you can proceed to create these roles. So in terraform, we need to make sure we apply these rules after the cluster admin role binding is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, your identity needs to have a superset of the permissions of a role to create that role. See https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#defining_permissions_in_a_role
@brikis98 I'd like a review from you on this one, specifically around the module dependencies system implemented here. The current implementation works (I had issues with resources being deleted too early which are now resolved), but I am not sure if there are other gotchas that we don't know about in implementing the module dependencies. |
@brikis98 Forgot to mention your review is not urgent, given that it currently works. Just wanted to hear your thoughts, eventually. |
- [minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) | ||
- [Kubernetes on Docker for Mac](https://docs.docker.com/docker-for-mac/kubernetes/) | ||
- EKS | ||
- GKE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to our modules?
1. Install [Terraform](https://www.terraform.io/), minimum version: `0.9.7`. | ||
1. Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/). | ||
1. Install [helm client](https://docs.helm.sh/using_helm/#install-helm) | ||
1. Install [kubergrunt](https://github.com/gruntwork-io/kubergrunt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's a lot of stuff to install. I think I've asked before, but does it make sense to embed kubectl
or helm client
in kubergrunt
(or call the APIs they are using directly)? Or do their versions/APIs change so often that we'd never be able to keep up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into this yet for helm
, but for kubectl
, we can't embed it because a lot of the logic is baked into the command (package main
) as opposed to a library.
That said, based on what I know, I think it is safe to assume that everyone will have kubectl
installed if they are doing anything related to Kubernetes. You lose A LOT when you don't have that, so that one is more or less a moot operation.
I think the helm client
might be avoidable, but I will have to look into that more. I would probably file that away as a "nice to have" though. I filed gruntwork-io/kubergrunt#13 so we don't forget this thought.
|
||
resource "null_resource" "dependency_getter" { | ||
provisioner "local-exec" { | ||
command = "echo ${length(var.dependencies)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does echo
exist on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Powershell yes. I have verified this. This does not exist on CMD though so not the most portable, but probably good enough based on prior conversations.
# SET MODULE DEPENDENCY RESOURCE | ||
# This works around a terraform limitation where we can not specify module dependencies natively. | ||
# See https://github.com/hashicorp/terraform/issues/1178 for more discussion. | ||
# --------------------------------------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this is meant to be used for? That is, what are you depending on?
@@ -12,3 +12,8 @@ output "rbac_access_read_only_role" { | |||
description = "The name of the RBAC role that grants read only permissions on the namespace." | |||
value = "${kubernetes_role.rbac_role_access_read_only.metadata.0.name}" | |||
} | |||
|
|||
output "depended_on" { | |||
description = "This output can be used to depend on the resources in this module." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example use case?
|
||
resource "null_resource" "tiller" { | ||
triggers { | ||
tiller_namespace = "${var.tiller_namespace}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you ever need to update tiller? I ask as this trigger will only fire on creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. You will most likely want to upgrade Tiller versions at some point. However, you can use the helm client to upgrade:
helm init --upgrade --tiller-namespace TILLER_NAMESPACE
I will add this to the README.
provisioner "local-exec" { | ||
command = "kubergrunt helm undeploy ${local.kubectl_config_options} --home ${local.helm_home_with_default} --tiller-namespace ${var.tiller_namespace} ${local.undeploy_args}" | ||
when = "destroy" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of this code seems to be calling out to kubergrunt
. Sanity check: should this really be Terraform code or just a new command that does a bunch of steps for you built into kubergrunt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, is there anything in this module that actually uses a real Terraform resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the sanity check. You are right that there is nothing in this module that uses a real Terraform resource, so yes I am forcing Terraform where the operator could just call out to kubergrunt
. I am not 100% sure if this is a good idea yet (I don't think I have enough Terraform scars to make a judgment call), but I have my reasons for at least attempting it.
There is one concrete use case in my head for doing this, and a few potential use case / benefits:
Concrete:
- For EKS, there are what I call "essential" helm charts. These are helm charts that almost all clusters will want to install. I don't know yet if there is an equivalent for GKE, but presumably you will want at least an ingress controller. In this use case, it is nice to be able to chain the modules in one
terraform apply
: provision EKS => configure kubectl => deploy Tiller => install "essential" helm charts (kube2iam
/kiam
andalb-controller
).
Potential:
- There is something nice about being able to manage access in a declarative manner. Once I implement the
revoke
command (deprioritized for now in favor of being able deploy services to k8s), changing the granted RBAC entity list should do the right thing and revoke users that didn't have access before and grant new ones. Note that right now this is append only because I am missing therevoke
command, but eventually I intend to implementrevoke
. - Similarly, there is something nice about declaratively managing Tiller deployments as code. If we just reduce to a
kubergrunt
command, how does one find out what Tillers were deployed? Which namespaces does Tiller manage? We could use labels and such to make it easier to discover, but I don't think that beats opening your terraform config and finding thek8s-tiller
module calls.
Ideally all this would be a new terraform provider, or baked into helm
provider. However, there are a few reasons why I avoided that in this initial implementation:
- We already are going to the direction of recommending people install
kubergrunt
for other reasons. So it would be one less thing to install if we could just usekubergrunt
. - I am not too familiar with provider development (@rileykarson probably has more insight here), but with the limited knowledge I have, I think the advantages of a provider over this approach is being able to implement updates (e.g change name or append labels) and better drift tracking via state management. Both are nice to haves, but I am not sure if it is worth the overhead of a new provider since what's here gets us 80% of what we need.
- Eventually I was going to tackle adding some of this functionality into the helm provider (Add tiller resource hashicorp/terraform-provider-helm#134), which would be the best case scenario for implementing the chaining. But I wanted to make sure we had a short term escape hatch in case something went horribly wrong there (the community rejects the idea, yak shaving on provider, etc), which is this.
- Not everyone will want to use Terraform for managing helm.
kubergrunt
satisfies those peoples' use case. Some people will want to use Terraform. This will satisfy those peoples' use case. So there is benefit to having thekubergrunt
commands, and wrapping it in Terraform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm worried that having this as a Terraform module will create the expectation that create (apply
), update (apply
again—will work for append but probably not replace?), and delete (destroy
) will all work correctly, but right now, as far as I can tell, only create works. I agree it's nice to capture, as code, what you did... but if you can't update or clean it up, I worry it will lead to confusion and errors. It's also complicated to use, as you have to install lots of dependencies besides Terraform itself, and we always have the potential for cross-OS portability issues.
Writing our own provider or doing a PR to the helm provider is a good idea, but I agree, likely to be too expensive to be worth doing now.
Would a middle ground be to turn this set of steps into a single kubergrunt
command (plus docs), and add another kubergrunt
command that shows you exactly what Tillers were deployed and the namespaces they manage (i.e., something like a tiller-status
command)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry it will lead to confusion and errors
That is probably a fair point. Thinking through a few update scenarios, I realized that you are probably right that this may lead to a lot of confusion.
It's also complicated to use, as you have to install lots of dependencies besides Terraform itself
Yes I agree about that. But what if we have:
- embed
helm
intokubergrunt
to remove the need forhelm
. - verify
kubectl
is unnecessary and if it is, updatekubergrunt
so it is not necessary (I think I only have this as a dependency right now because of the kubeconfig setup and management). - Implement
fetch_exec
module that fetcheskubergrunt
and runs it. This is what we talked about. This replaceskubergrunt
withpython
, so slightly better but arguably is a wash.
These are probably good things to do regardless of the decision made.
and we always have the potential for cross-OS portability issues.
I don't quite understand the potential cross-OS portability issue that is specific to this approach. What is in here that leads to that concern?
only create works.
FWIW, destroy
actually works.
Would a middle ground be to turn this set of steps into a single kubergrunt command (plus docs), and add another kubergrunt command that shows you exactly what Tillers were deployed and the namespaces they manage (i.e., something like a tiller-status command)?
I'm not sure it is necessary to have the joined command (maybe a few more options to deploy to autoconfigure the deployer's local helm client?), but tiller-status
is definitely good to have regardless.
Alright you've convinced me. I think with some time we can actually make this work, but it is arguable if it is worth the effort now. Especially considering the ticking time bomb that is helmv3
where all of this will be put to waste in the Tiller-less world.
=========================================
So to summarize:
- Add
--local-rbac-user
,--local-rbac-group
,--local-rbac-service-account
options tokubergrunt helm deploy
that can be used to configure the local helm client with that rbac entity. - Implement a
tiller-status
command intokubergrunt
that shows you all the installed Tillers and what RBAC entities have been granted access. - We will abandon the efforts to modularize Tiller here. The consequence is that this repo needs to be renamed (possibly
terraform-kubernetes-namespace
orterraform-kubernetes-namespace-factory
?). @brikis98 Have we renamed a released repo before? Does github redirect old names?
@robmorgan and @rileykarson does that make sense given the discussion in this thread? What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the argument against storing in state makes enough sense, thanks! I guess our [the Google provider team] view has generally been that Terraform requires access to privileged data and consequently should be able to manage it. There are definitely a lot of places that certs/creds are required to use certain features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terraform state and secrets is not a great situation. Since secrets are stored in plain text in state files, we try to minimize what secrets we put in them when possible. Unfortunately, this often isn't possible with Terraform, and we sort of cringe, and hope that whatever protections we put in place for the state storage (e.g., S3 IAM permissions) are enough.
I don't think I'd make this the reason to design our helm integration entirely differently. That said, we need to be practical. How long would it take to get hashicorp/terraform-provider-helm#134 done? That is, to write the code, test it, get it reviewed, merged, and released? And this will likely take multiple rounds of code/test/review/merge/release, as we're unlikely to get a perfect, bug-free implementation on the first try. If this is a few days of work, then that seems like the way to go. If this is multiple weeks of work, with a non-trivial chance of being blocked for months more waiting on reviews or releases, then we probably don't want to stall the entire GCP project waiting on that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right- at first glance resolving hashicorp/terraform-provider-helm#134 would probably take 2-5 eng days to complete and then a week or 2 for review passes + then waiting for a release to be cut (although that provider would possibly cut one at will? not sure)
Given the TLS/secret situation and the latency that making provider changes would have, I think that the proposed kubergrunt
functionality makes sense 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those sanity checks @brikis98 and @rileykarson! Always good to hear second opinions after being solo on the thoughts for a while :)
Ok to summarize:
- Ideally we will use terraform resources for everything, but due to constraints on secrets management and provider resource deficiency, we will work outside of that.
- The
kubergrunt
functionality is a "good enough" solution for now. It has the added bonus of being potentially useful to those that don't use terraform either. - The dependency logic for service accounts and namespaces is necessary for GKE, so that should be implemented.
We still have an open question about this module. Here are the Pros and Cons I gather:
Pros:
- You can feed in the outputs of
k8s-namespace
andk8s-service-account
tokubergrunt
more easily than copy paste from outputs. - Implements destroy
- Provides documentation as code of all the tiller deployments you have potentially made
Cons:
- Is not idiomatic Terraform, and thus is unintuitive:
- Depends on external tools
- Does not implement plan and updates correctly
- Could it be disaster inducing? E.g can you easily undeploy Tiller and all the releases accidentally with a typo?
=====
For now, I am leaning towards making this an example with a null_resource
calling out to kubergrunt
instead of a k8s-tiller
submodule for now, so it is clearer that we don't implement all the terraform niceties, and document our intention to update when the helm provider is updated. If it turns out that it would be nice to have an actual module making the calls to kubergrunt
after implementing the examples in EKS and GKE, we can revisit.
So given that, here is what I am going to do:
- Close this PR and the related feature branch
- Open a new PR with the dependency logic in
k8s-namespace
andk8s-service-account
so that GKE works. - Open a new PR with an example that calls out to kubergrunt and put documentation there about the approach, alternatives we considered, and why it is that way right now.
- File github issues:
- Contribute to Add tiller resource hashicorp/terraform-provider-helm#134
- Implement
k8s-tiller
module using the Tiller resource, kubernetes provider, and TLS provider, with warnings about tfstate. If users want an alternative, we can point tokubergrunt
.
Let me know if there are any objections. I am going to continue with this motion, but I believe it will be easy to reverse any of these actions so we can continue even after I have started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm!
count = "${trimspace(local.grant_args) == "" ? 0 : 1}" | ||
|
||
triggers { | ||
grant_args = "${local.grant_args}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you update these args, will kubergrunt helm grant
remove the old ones and grant the new ones? Or is it append-only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is append only for now, but I plan on implementing revoke
, at which point I will change this to a loop and grant / revoke each one separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding as a comment or to the docs
# INTERPOLATE AND CONSTRUCT COMMAND ARGUMENTS | ||
# --------------------------------------------------------------------------------------------------------------------- | ||
|
||
// TODO: When we have package-terraform-utilities available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could open source this. It's not exactly secret sauce...
As discussed, this has been replaced with |
depends on gruntwork-io/terratest#231 and
gruntwork-io/kubergrunt#7MERGEDThis implements the
k8s-tiller
module and example that allows you to deploy Tiller following security best practices.Notes:
kubergrunt
under the hood.