Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[safer-cluster] Replace "kubernetes_version" with "release_channel" #487

Conversation

skinlayers
Copy link
Contributor

@skinlayers skinlayers commented Apr 13, 2020

See #486

Copy link
Contributor

@morgante morgante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think this is a reasonable change.

autogen/safer-cluster/variables.tf.tmpl Outdated Show resolved Hide resolved
Co-Authored-By: Morgante Pell <morgante.pell@morgante.net>

// Nodes are created with a default version. The nodepool enables
// auto_upgrade so that the node versions can be kept up to date with
// the master upgrades.
//
// https://cloud.google.com/kubernetes-engine/versioning-and-upgrades
node_version = ""
node_version = var.node_version
Copy link
Contributor Author

@skinlayers skinlayers Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgante Is node_version actually used anywhere in any of the modules? I was grepping through the code last night, but AFAICT, node_version is never consumed by any resource. Am I crazy, or can we just remove node_version, node_version_regional, and node_version_zonal from autogen/main & autogen/safer-cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it looks like we might have dropped it entirely. Feel free to remove the variable in that case.

variable "release_channel" {
type = string
description = "(Beta) The release channel of this cluster. Accepted values are `UNSPECIFIED`, `RAPID`, `REGULAR` and `STABLE`. Defaults to `REGULAR`."
default = "REGULAR"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why pick REGULAR as default over STABLE?

Copy link
Contributor Author

@skinlayers skinlayers Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #486 & #451 for additional context.

tl;dr Google deems both STABLE & REGULAR to be "production-quality". REGULAR is usually just a few patch revisions behind the latest GKE version. @morgante (Cloud Foundation Toolkit maintainer who works for Google) would prefer safer-cluster use the latest version of GKE. I can see a number of reason for this preference. For instance if you wanted to apply additional CIS Benchmarks listed here: https://cloud.google.com/kubernetes-engine/docs/concepts/cis-benchmarks. That page currently only refers to GKE v1.15, which is also the current REGULAR channel.

Copy link

@umairidris umairidris Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I looked through it but didn't see any particular reason for picking regular over stable (maybe I missed that section).

While I don't have a strong opinion, based on the GKE documentation I think a default setting would be better suited to the more stable option. Users who want newer features etc can switch to regular etc if they would like by overriding.

I will leave it up to your judgment, just doing a drive-by here. Thanks for the PR, we're hoping to use it as soon as it's released :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think regular is a reasonable default, because we'd potentially like to add enhanced security features in the future w/o waiting for them to appear in the stable channel. For clusters which require extra stability, they can override.

MartinPetkov added a commit to GoogleCloudPlatform/fda-mystudies that referenced this pull request Apr 20, 2020
* Add TODO to replace kubernetes_version with release_channel

This depends on terraform-google-modules/terraform-google-kubernetes-engine#487 being merged.

* Remove unused field

Co-authored-by: Martin Petkov <mpetkov@google.com>
@morgante morgante merged commit 5791ac1 into terraform-google-modules:master Apr 21, 2020
MartinPetkov added a commit to GoogleCloudPlatform/fda-mystudies that referenced this pull request Apr 23, 2020
* add baseline org infra

* comment out gcs block, fmt

* add data deployment

* add cloudsql and network

* rename network to networks

* add shared vpc

* add backend blocks

* fix list item

* Add GKE clusters and GKE network

This config hasn't been tested yet.

* add data regions

* fix cloudsql registry path

* fix terragrunt configs

* fixes

* Input the network for GKE through Terragrunt

Also adjust variables from "network_name" to just "network" and use the
"network_self_link" output instead of "network_name".

* Fix typo in mock output for GKE network terragrunt dependency

* add firebase project

* update names to match gcp setup doc

* Add network project in org policy.

* Add cloudbuild triggers.

* rename folder

* Move the GKE clusters into the same subnet

Also combine the GKE subnet definition into main.tf

* add gitignore, update bootstrap to fix dep and cleanup order

* put terraform block back up top

* Add selective CICD steps.

* Pure fmt fix.

* HCL formatting.

* Add a note for tf validate.

* Add plan cloudbuild job.

* Comment out cloud build resources initially.

* check in bootstrap terraform block

* add org policies terragrunt file

* Specify file include path in triggers and disable them initially.

* Make plan presubmit job run by default and only make SA viewer.

* post-deployment changes

* Format fix.

* Add permissions to SA and enable more APIs.

* Use for_each for roles.

* Refine viewer roles.

* more deployment fixes

* Format

* move devops triggers to org folder

* rename to hhas_gke

* make buckets multi regional

* add sql admin api o on apps project

* Add variables/tfvars for triggers.

* Add apply trigger.

* Use variable to control CD job.

* move shared vpc deployment to networks, limit gke SA access to subnet

* Add missing API and permissions to deploy firebase project.

* Delete .gitignore in Terraform/ directory.

* Add included files for presubmit triggers.

* Enable subnet_private_access and subnet_flow_logs

* Add explicit IP address range for the Cloud SQL private access

* Update comment for Cloud SQL network prefix

* Make the GKE apps depend on the data deployment

They need access to Cloud SQL and Firestore.

* Enable CD to do a full deployment.

* Increase the timeout.

* add initial readme, add org admin to bootstrap

* minor spacing fixes

* minor spacing fixes

* fmt

* add terraform readme

* rm readme

* Include cd config itself in trigger.

* Use CFT container.

* Upgrade TF version.

* add deployment steps

* Combine all clusters into one (#122)

It seems the FDA MyStudies team has moved towards combining all servers into
one cluster, "heroes-hat-dev". Adjust Terraform configs to match that.

Also increase the IP ranges for pods and services, to leave more room for
scaling.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Re-add subnet ranges because GCP can't add and remove secondary IP ranges in the same request (#123)

After TF runs in CD, remove these ranges again.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Missed one for auth-server (#124)

Co-authored-by: Martin Petkov <mpetkov@google.com>

* aaddress zohreh comments

* try to fix numbering

* rename folder

* wording

* Re-remove subnets, can't add and delete secondary subnet IP ranges (#125)

Co-authored-by: Martin Petkov <mpetkov@google.com>

* prefix

* Add roles/cloudsql.client binding for the data project (#126)

Requires a new module under data/ because:
* apps depends on data to exist for the Cloud SQL connection
* This new IAM binding in data requires apps to exist to know the service account

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Make region and zone mandatory and consistent. (#137)

* Add random suffix to sql name to avoid name duplication.

* Simply rename the SQL instead.

* Add missing serviceAccount: prefix. (#146)

* Add support for secrets (#144)

* add support for secrets

* add layout

* use for_each and add remote state

* fmt

* Add README for CICD. (#139)

* Add README for CICD.

* Pull cicd to a separate deployment.

* include children for audit exports (#147)

* Add bastion host. (#136)

* Add bastion host.

* Add NAT.

* Add VM startup script and restrict NAT source IP range.

* Enable IAP api in network project. (#148)

* add bastion service account output, fix apps mock outputs (#153)

* add bastion service account as sql client (#151)

* add bastion service account as sql client

* fix var

* update sql connectionon instructions

* rename sql_clients to sql_client_service_accounts

* fix gke apps terragrunt

* fmt

* fix startup script for cloud sql proxy setup (#155)

* Add new secrets in Secret Manager for later use in Kubernetes (#152)

The Kubernetes secrets need to refer to existing Secret Manager secrets, so
do it in 2 steps.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Add documentations for org level resources. (#160)

* Add documentations for org level resources.

* Address comments.

* Add license. (#163)

* Move Cloud Build Triggers for GKE containers to app project. (#159)

* Move Cloud Build Triggers for GKE containers to app project.

* Enable cloudbuild api.

* Remove duplicate backend block

* Add more documentation. (#164)

* Add service accounts for each GKE app (#168)

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Add multiple DB credentials and roles/cloudsql.client for the GKE SAs (#161)

* Add multiple GKE service accounts and DB creds

In support of using different credentials for each GKE app.

* Address comments

* Move GKE SAs out, they're a separate PR now

PR: #168

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Fix projects for DB users secrets (#169)

* Fix projects for DB users secrets

The users go in the data project, the secrets go in the secrets project.

* Remove project from secret version

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Update k8s deployments and add cluster-wide configs (#157)

* Update k8s deployments and add cluster-wide configs

Changes:
* Update all services to use NodePort and Container-Native Load Balancing.
* Update all deployments to refer to services by name, not IP address.
* Move the ingress and cert configs to a separate folder "kubernetes/".
* Add Pod Security Policies for the cluster and Istio.
* Add helper script for applying the k8s configs.
* Add helper script for moving Docker images from the main registry.

* Remove the response-server-ws-gcloud-key user-registration-server-ws-gcloud-key secrets

These aren't needed, as these apps can use the GKE credentials.
See https://cloud.google.com/docs/authentication/production

* Revert "Remove the response-server-ws-gcloud-key user-registration-server-ws-gcloud-key secrets"

This reverts commit 6c38d0f.

* Make apps use separate SAs, gcloud keys, and DB creds

* Include changes to ingress.yaml from early-access

* Add usage and args to the push_images.sh script

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Update TF Engine config to date. (#177)

The main purpose is for demoing of TF Engine, not to
re-generate the rest of Terraform configs.

* Minor fix in engine template and configs. (#178)

* Add TODO to replace kubernetes_version with release_channel (#182)

* Add TODO to replace kubernetes_version with release_channel

This depends on terraform-google-modules/terraform-google-kubernetes-engine#487 being merged.

* Remove unused field

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Add Kubernetes Secrets (#149)

* Add Kubernetes Secrets

Install Kubernetes Secrets via terraform, by both using Secret Manager and
service account keys.

* Move the secrets to their own deployment

This deployment depends on the main GKE cluster deployment via Terragrunt.

* Format the kubernetes/terragrunt.hcl file

* Fix for_each in secrets.tf

* Remove Secret Manager changes

Doing it separately in #152

* Rename to "my_studies_cluster" in the new code

* Fix secret names

* Remove extra line

* Output the password in the data deployment

* Refortmat terragrunt.hcl files

* Remove the response-server-ws-gcloud-key user-registration-server-ws-gcloud-key secrets

These aren't needed, as these apps can use the GKE credentials.
See https://cloud.google.com/docs/authentication/production

* Revert "Remove the response-server-ws-gcloud-key user-registration-server-ws-gcloud-key secrets"

This reverts commit 87f786a.

* Create separate DB users and SAs for each app

* Add project id to apps service accounts

* Revert changes to apps/ and data/ deployments

Moved to a separate PR: #161

* Remove file from another PR

* Retrieve the SQL user password from Secret Manager

* terragrunt hclfmt

* Remove unused Kubernetes secrets

Some secrets are no longer used after merging #157

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Update kubeapply (#180)

* Add pointer to Kubernetes setup instructions

* Update kubeapply.sh to do deployments and not fetch the SA

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Use GKE token and fully-qualified SAs in secrets (#183)

For the SA, from https://www.terraform.io/docs/providers/google/r/google_service_account_key.html:
"This can be a string in the format {ACCOUNT} or projects/{PROJECT_ID}/serviceAccounts/{ACCOUNT}, where {ACCOUNT} is the email address or unique id of the service account."

Since it's using a unique_id, it should use the full name.

For the token, I got the suggestion from here: hashicorp/terraform-provider-kubernetes#382 (comment)

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Fix SQL instance connections in k8s deployments (#185)

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Update Terraform configs with changes in engine. (#184)

* Update Terraform configs with changes in engine.

* Generate configs using OSS TF Engine. (#187)

* Improve CICD permissions (#189)

* remove project owners

* secret manager accessor

* fix perms

* rm viewer perm

* fmt

* add sm viewer

* add logging perms

* rm logging viewer

* Document the GKE cluster setup steps (#181)

* Add pointer to Kubernetes setup instructions

* Document the GKE cluster setup steps

* Address comments

* Change "db-name" to "instance-name"

* Move procedures before version_info import

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Fixes. (#194)

* More fixes. (#195)

* Use the right DB names for the apps (#193)

* Use the right DB names for the apps

These values are supposed to be the database names used by each app, not the name of the Cloud SQL instance name.

* Fix variable name apps_db_names

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Add super experimental rename templates script. (#196)

* Fix user-registration-server-np port (#197)

Currently 60000, but the Ingress expects it to be 50000, like with the other services.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Comment out customer triggers initially. (#199)

Add more permissions.

* Install the Istio sidecar (#203)

This is recommended by https://cloud.google.com/istio/docs/istio-on-gke/installing#installing_istio_on_gke_2

The annotations method comes from https://istio.io/docs/setup/additional-setup/sidecar-injection/#policy

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Add cloud build viewer IAM member. (#201)

* Improve rename.sh and add to README.md. (#205)

* Address comments from #202 (#204)

Changes:

* Remove .json keys from .gitignore, since they're no longer manually managed.
* Move the audit bucket to us-east1 like all the other resources.
* Replace a CLIENT_ID with a dummy test value in a k8s deployment.
* Use the right port for the user-registration-server-np.
* Fork the k8s deployment.yaml files, to avoid clobbering the working ones.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Remove workflows to try to get GitHub to see them. Will be re-added immediately.

* Revert "Remove workflows to try to get GitHub to see them. Will be re-added immediately."

This reverts commit 756498a.

* Remove the whole .github/ dir. Will readd immediately.

* Revert "Remove the whole .github/ dir. Will readd immediately."

This reverts commit dcc781b.

* Create a static external IP for the Ingress (#207)

* Remove workflows to try to get GitHub to see them. Will be re-added immediately.

* Revert "Remove workflows to try to get GitHub to see them. Will be re-added immediately."

This reverts commit 756498a.

* Remove the whole .github/ dir. Will readd immediately.

* Revert "Remove the whole .github/ dir. Will readd immediately."

This reverts commit dcc781b.

* Create a static external IP for the Ingress

GKE can create an external IP automatically, but it may change if the ingress
is deleted and recreated. Instead, create one in Terraform, and always use the
same one in the Ingress.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Change Ingress static IP to global (#208)

* Remove workflows to try to get GitHub to see them. Will be re-added immediately.

* Revert "Remove workflows to try to get GitHub to see them. Will be re-added immediately."

This reverts commit 756498a.

* Remove the whole .github/ dir. Will readd immediately.

* Revert "Remove the whole .github/ dir. Will readd immediately."

This reverts commit dcc781b.

* Change Ingress static IP to global

Looking again at the instructions at https://cloud.google.com/kubernetes-engine/docs/tutorials/configuring-domain-name-static-ip, this address should be global.

Co-authored-by: Martin Petkov <mpetkov@google.com>

* Fork the service.yaml files into tf-service.yaml

Co-authored-by: umairidris <umairidris@google.com>
Co-authored-by: Martin Petkov <mpetkov@google.com>
Co-authored-by: Xin Gao <xingao@google.com>
Co-authored-by: Xin <xingao267@users.noreply.github.com>
CPL-markus pushed a commit to WALTER-GROUP/terraform-google-kubernetes-engine that referenced this pull request Jul 15, 2024
…nel" (terraform-google-modules#487)

BREAKING CHANGE: For the safer cluster module, you must now specify `release_channel` instead of `kubernetes_version`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants