Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodSecurityPolicy implementation changing the cluster on every iteration #585

Closed
msgongora opened this issue Jul 1, 2020 · 6 comments · Fixed by #586
Closed

PodSecurityPolicy implementation changing the cluster on every iteration #585

msgongora opened this issue Jul 1, 2020 · 6 comments · Fixed by #586
Labels
bug Something isn't working good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work

Comments

@msgongora
Copy link
Contributor

Summary:
pod_security_policy_config with default value is causing to update the cluster every time you run terraform using beta-private-cluster-update-variant. I was able to verify that this issue came from the provider itself but since I'm facing it because the way it was implemented in this module, I'm reporting it here.

How to reproduce it:

module "cicd_cluster" {
 source                       = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
  version                      = "v8.1.0"
  project_id                   = var.project
  region                       = var.region
  service_account              = var.service_account
  name                         = local.cluster_name
  regional                     = false
  zones                        = local.zones
  network                      = local.primary_network
  subnetwork                   = local.primary_subnetwork
  ip_range_pods                = local.cicd_pods_range
  ip_range_services            = local.cicd_services_range
  release_channel              = local.release_channel
  master_ipv4_cidr_block       = local.master_ipv4_cidr_block
  cluster_resource_labels      = local.cluster_labels
  remove_default_node_pool     = true
  enable_private_nodes         = true
  create_service_account       = false
  node_metadata                = "GKE_METADATA_SERVER"
  identity_namespace           = "enabled"
  authenticator_security_group = local.authenticatorsg_email
  maintenance_start_time       = "07:00"
  node_pools = [
    {
      name               = "main-pool"
      machine_type       = local.main_instance_type
      disk_size_gb       = local.main_disk_size
      disk_type          = local.main_disk_type
      min_count          = 1
      max_count          = 20
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = true
      preemptible        = false
      initial_node_count = 1
    },
  ]

  node_pools_taints = {
    all       = []
    main-pool = []
  }
}
$ terraform init 

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google-beta" (terraform-providers/google-beta) 3.14.0...
- Downloading plugin for provider "kubernetes" (hashicorp/kubernetes) 1.11.3...
- Downloading plugin for provider "random" (hashicorp/random) 2.2.1...
- Downloading plugin for provider "null" (hashicorp/null) 2.1.2...
- Downloading plugin for provider "google" (hashicorp/google) 3.14.0...
...
...

$ terraform apply

data.google_compute_network.net: Refreshing state...
module.test_cluster.data.google_container_engine_versions.region: Refreshing state...
data.google_compute_subnetwork.subnet: Refreshing state...
module.test_cluster.data.google_client_config.default: Refreshing state...
module.test_cluster.data.google_compute_zones.available: Refreshing state...
module.test_cluster.data.google_container_engine_versions.zone: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.test_cluster.google_container_cluster.primary will be created
  + resource "google_container_cluster" "primary" {
      + additional_zones            = (known after apply)
      + cluster_ipv4_cidr           = (known after apply)
...
...
Plan: 9 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
...
module.test_cluster.module.gcloud_wait_for_cluster.null_resource.module_depends_on[0]: Creation complete after 0s [id=1129484066433314996]
module.test_cluster.module.gcloud_delete_default_kube_dns_configmap.null_resource.module_depends_on[0]: Creation complete after 0s [id=2540290453370883089]

Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
...
...

At this point I'm expecting to have a state for pod_security_policy_config with value false, something like this:

$ terraform state show module.test_cluster.google_container_cluster.primary |grep -A2 pod_security_policy_config
    pod_security_policy_config {
        enabled = false
    }

But google provider isn't creating this state for some reason:

$ terraform state show module.test_cluster.google_container_cluster.primary |grep -A2 pod_security_policy_config
  

Because of that, gke module will update the cluster every time:

$ terraform plan

Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.test_cluster.google_container_cluster.primary will be updated in-place
  ~ resource "google_container_cluster" "primary" {
        additional_zones            = []
        cluster_ipv4_cidr           = "172.20.128.0/18"
        default_max_pods_per_node   = 110
...
...
       }

      + pod_security_policy_config {
          + enabled = false
        }

        private_cluster_config {
            enable_private_endpoint = false
            enable_private_nodes    = true
...
...
Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

I was able to replicate this issue w/ module version 9.4.0 and google-beta 3.28.0

@morgante
Copy link
Contributor

morgante commented Jul 1, 2020

Since this issue is occurring on the provider itself, can you file a bug there please?

@msgongora
Copy link
Contributor Author

I'm wondering why pod_security_policy_config was declared as a list instead of bool. Reading the documentation --enable-pod-security-policy looks like a bool flag.

I'm going to create the issue on the provider. In the mean time I'm using this workaround.

@bharathkkb
Copy link
Member

@msgongora Can you confirm you tested with 9.4.0 and google-beta 3.28.0?
From the logs you posted it seems like provider:3.14.0 and module:"v8.1.0". We had some weird issues previously too but got fixed in 3.18.0 I think. #497 (comment)

@msgongora
Copy link
Contributor Author

It's happening w/ provider 3.28 as well. The module implementation hasn't change in regard to podsecurirypolicy but yes, also happening with 9.4

@morgante
Copy link
Contributor

morgante commented Jul 2, 2020

Regardless of upstream changes, I think we should fix the module interface by:

  1. Replacing the pod_security_policy_config variable with a simple pod_security_policy_enabled boolean variable.
  2. Update the actual module code to use a dynamic block which is only generated when pod_security_policy_enabled is true.

@morgante morgante added bug Something isn't working good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work labels Jul 2, 2020
@msgongora
Copy link
Contributor Author

@morgante I think also the previous implementation can be confusing because this option is mean to enable the admission controller, not to define pod security policies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants