Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding various beta features and code hygiene #196

Conversation

chrislovecnm
Copy link
Contributor

@chrislovecnm chrislovecnm commented Jun 27, 2019

This PR completes some basic code hygiene. We moved the database
encryption block in the same if statement as the other beta components.

With this commit we include various beta features that are supported at this time:

Cluster Configurations

  • authenticator_groups_config
  • vertical_pod_autoscaling
  • workload_identity_config
  • enable_intranode_visibility
  • default_max_pods_per_node

We have TPU, some nodepool options, and other beta features that I have not put in this PR.
Will revisit in later PRs.

This has been tested using the 2.9.1 google provider. Upgrading the examples to that
provider will occur in a different PR.

@morgante
Copy link
Contributor

@chrislovecnm Please drop a ping when this is ready for review (and remove WIP).

@chrislovecnm chrislovecnm force-pushed the beta-features branch 2 times, most recently from 778b072 to a704129 Compare June 27, 2019 19:16
@chrislovecnm chrislovecnm changed the title [WIP] Adding various beta features and code hygiene Adding various beta features and code hygiene Jun 27, 2019
@chrislovecnm
Copy link
Contributor Author

@morgante I am currently completing more testing locally, but I hope that I am coded complete. PTAL

I am going to drop in a PR next to add beta modules to the testing, if it is not too insane.

@chrislovecnm
Copy link
Contributor Author

CI is failing because of a change in the node pool integration test. We added a nvidia-tesla-p4 GPU to the nodepool, and because of that the taints have changed.

@chrislovecnm chrislovecnm changed the title Adding various beta features and code hygiene [WIP] Adding various beta features and code hygiene Jun 27, 2019
@aaron-lane aaron-lane added the enhancement New feature or request label Jun 27, 2019
@chrislovecnm chrislovecnm changed the title [WIP] Adding various beta features and code hygiene Adding various beta features and code hygiene Jun 27, 2019
@chrislovecnm
Copy link
Contributor Author

@morgante PTAL

Copy link
Contributor

@morgante morgante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but can you confirm that this doesn't cause any issues with existing clusters? Just need to confirm these aren't breaking changes.

@chrislovecnm
Copy link
Contributor Author

@morgante these are all additions so these should not be any breaking changes. But ...

@morgante
Copy link
Contributor

@chrislovecnm In theory, I agree. In practice, there can be important differences in setting an argument to an empty value and leaving it off entirely.

@aaron-lane
Copy link
Contributor

This partially overlaps with #174. Can we consolidate the two efforts?

@chrislovecnm
Copy link
Contributor Author

@morgante then what do you recommend? I have tested this locally, and the integration tests have passed, except for the known bug that we have with the node_pool test.

@aaron-lane
Copy link
Contributor

@morgante the beta submodules have not yet been released. Are you concerned about the upgrade path from the existing partially-beta private cluster to the new private beta cluster?

@chrislovecnm
Copy link
Contributor Author

@aaron-lane these are beta features, and IMO should be added to the beta modules that we have now. No idea how we can combine efforts on #174

@chrislovecnm
Copy link
Contributor Author

@aaron-lane if you want me to remove default_max_pods_per_node I can 🤷‍♂

@morgante
Copy link
Contributor

@aaron-lane This seems to be a superset of #174, so I'd say we can merge this and add 174 later if we need to.

@aaron-lane Yes, I'm concerned about people using existing clusters upgrading to the private beta cluster. Just want to confirm there's no state plan changes. It's not a merge blocker though, that can be addressed in release testing/migration notes.

@chrislovecnm
Copy link
Contributor Author

@morgante @aaron-lane PTAL and review. This hopefully with Marko's fix will pass CI now.

}


variable "vertical_pod_autoscaling" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't force the implementation details of this onto users. I'd prefer to just expose this as vertical_pod_autoscaling_enabled.

@@ -300,6 +300,34 @@ variable "pod_security_policy_config" {
"enabled" = false
}]
}

variable "authenticator_groups_config" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exposing this as a list, can we simply expose authenticator_groups_security_group directly?

Copy link
Contributor Author

@chrislovecnm chrislovecnm Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need to fix psp as well. It is the exact same way. The challenge is that sometimes the data model gets changed, for example Isto. We then have to implement that change in the upstream data model as well. Our istio config values are already out of date in the beta module.

Let me know exactly what you want me to do, and I will focus on it tomorrow.

If you want me to expose all of these a separate configuration values I can.

The other challenge is defaulting the values. We cannot default values easily, for example psp ... the default values is actually the data structure. Maybe there is a way in terraform, and I do not know it ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psp can be fixed as a separate PR. Please go ahead and update this PR only for the variables you added.

Yes, the underlying data model sometimes changes but I'd rather have us take on the burden of updating with those changes than forcing every module user to update it themselves.

Copy link
Contributor Author

@chrislovecnm chrislovecnm Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, I am not certain we can have an empty or default value for say workload_identity_config or authenticator_groups_config.

With pod security policies we have this:

variable "pod_security_policy_config" {
description = "enabled - Enable the PodSecurityPolicy controller for this cluster. If enabled, pods must be valid under a PodSecurityPolicy to be created."
default = [{
"enabled" = false
}]
}

Which I think we can set a default variable easily.

With workload_identity_config we need to have something like this:

workload_identity_config = [ 
  {
     identity_namespace = "GKE_METADATA_SERVER"
  }
]

The values can be "UNSPECIFIED", "SECURE", "EXPOSE", "GKE_METADATA_SERVER".

We can default a value for them, but again we have to set something, because of the data model the upstream provider has chosen.

I would prefer a bit more complex data model on our end that would not require constant updates, and document it a ton. I found the values in the provider and can map them to the --workload-metadata-from-node flag in gcloud.

I understand your concern, just trying to work on a good design that is lower maintenance, and still usable by our users.

I will add more documentation regardless, and I am joining the docs between the main google repo and the beta repo, and reading through the source code. Happy to implement this however we want, just ask for some clear guidance.

p.s. reading the documentation a second time I think we can use "UNSPECIFIED" which chooses the default. But for authenticator_groups_config may be a bit more challenging.

Copy link
Contributor

@morgante morgante Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern, just trying to work on a good design that is lower maintenance, and still usable by our users.

Even if we document it, I would much prefer that we take on the maintenance burden rather than customers.

If the field format changes, then it's preferable for us to update it in the module (once) rather than forcing every module user to update (potentially hundreds of times).

Admittedly, there are some cases where setting an appropriate default value is challenging but in those I think we can still use a conditional to pass an empty list.

Please proceed with exposing the higher-level options to users (ex. workload_identity_namespace) and transforming them into the correct values within the module itself.

Copy link
Contributor Author

@chrislovecnm chrislovecnm Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current Status

Because we cannot create default values for authenticator_groups_config, I am recommending to have the user set the list by hand. I have not been able to figure out how to use a list and a map with a trinary, without major code complexity. There are solutions, but they are really, really, really ugly complex code. With tf 0.12, it becomes a lot easier, but we are not there yet.

You get the following type of error:

Error: module.gke.google_container_cluster.primary: 1 error(s) occurred:

* module.gke.google_container_cluster.primary: At column 3, line 1: conditional operator cannot be used with map values in:

${var.authenticator_groups_config != "" ? map("security_group", var.authenticator_groups_config) : map()}

If you want to read more hashicorp/terraform#12453 - it is a feature request that they implemented in 0.12.

A single list is not bad: hashicorp/terraform#12453 (comment) with 0.11

Just a map alone is complex hashicorp/terraform#12453 (comment)

And we have a list with a map inside of it 🤦‍♂.

Recommendation

workload_identity_namespace and vertical_pod_autoscaling are very doable. I recommend default authenticator_groups_config default to an empty list, and then the user has to set it to a

default authenticator_groups_config = [
  {
    security_group = "my-security-group-name"
  }
]

while workload_identity_namespace or vertical_pod_autoscaling would be a simpler value like:

workload_identity_namespace = "GKE_METADATA_SERVER"
vertical_pod_autoscaling = true

Please let me know if this is an acceptable approach!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrislovecnm In the interest of moving forward, I'd suggest we initially merge only workload_identity_namespace and vertical_pod_autoscaling in this PR, then do a follow-up PR for the authenticator groups config. Does that seem acceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante - @aaron-lane mentioned that he would prefer the changes in the 0.12 branch. Has that changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to @aaron-lane but our top priority is 0.12 support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#177 is on the cusp of being merged, and released tomorrow. 🤞

}]
}

variable "workload_identity_config" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't even find an explanation of the expected value here in the docs.

We should either (a) expose the underlying options as separate variables or (b) provide an example of what the input list should look like.

Copy link
Contributor Author

@chrislovecnm chrislovecnm Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs are a bit lacking at times. From values in the source code I was able to map it back to a flag in gcloud:

     --workload-metadata-from-node=WORKLOAD_METADATA_FROM_NODE
        Type of metadata server available to pods running in the nodepool.
        WORKLOAD_METADATA_FROM_NODE must be one of:

         EXPOSED
            Pods running in this nodepool have access to the node's underlying
            Compute Engine Metadata Server.
         GKE_METADATA_SERVER
            Run the Kubernetes Engine Metadata Server on this node. The
            Kubernetes Engine Metadata Server exposes a metadata API to
            workloads that is compatible with the V1 Compute Metadata APIs
            exposed by the Compute Engine and App Engine Metadata Servers. This
            feature can only be enabled if Workload Identity is enabled at the
            cluster level.
         SECURE
            Prevents pods not in hostNetwork from accessing certain VM
            metadata, specifically kube-env, which contains Kubelet
            credentials, and the instance identity token. This is a temporary
            security solution available while the bootstrapping process for
            cluster nodes is being redesigned with significant security
            improvements. This feature is scheduled to be deprecated in the
            future and later removed.
         UNSPECIFIED
            Chooses the default.

We should figure out if we can take the docs out of gcloud, and use them in our docs :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's try to include those in the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that is correct @chrislovecnm ?

Those values are intended for setting nodeMetadata in node pool config whereas for the identityNamespace for WorkloadIdentityConfig it should be in the format of

https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_workload_identity_on_a_new_cluster

gcloud beta container clusters create [CLUSTER_NAME] \
  --cluster-version=1.12 \
  --identity-namespace=[PROJECT_NAME].svc.id.goog

So for terraform its:

  workload_identity_config {
    identity_namespace = "${var.project_id}.svc.id.goog"
  }

This PR completes some basic code hygiene. We moved the database
encyption block in the same if statement as the other beta components.

With this commit we include various beta features that are supported at this time:

Cluster Configurations

- authenticator_groups_config
- vertical_pod_autoscaling
- workload_identity_config
- enable_intranode_visibility
- default_max_pods_per_node

We have TPU, some nodepool options, and other beta features that I have not put in this PR.
Will revist in later PRs.

This has been tested againt the 2.9.1 google provider.  Upgrading the examples to that
provider will occur in a different PR.
@chrislovecnm
Copy link
Contributor Author

I'm gonna close this, since we have to update all the code for TF 0.12

@morgante
Copy link
Contributor

@chrislovecnm Will you have bandwidth to open a new PR with these features?

If not, can you make sure there are open issues so we can have someone on our team prioritize it?

@chrislovecnm
Copy link
Contributor Author

I have no idea tbh

@Dev25
Copy link
Contributor

Dev25 commented Jul 21, 2019

I can open up a PR for some of the beta addons mentioned in this PR next week. It follows the same approach (e.g. PSP) where its using a default empty list + dynamic/for_each where needed.

Dev25/terraform-google-kubernetes-engine@gce-labels...Dev25:beta-addons

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants