Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perma-diff due to new location_policy being repeatedly unset #1478

Closed
jawnsy opened this issue Nov 24, 2022 · 6 comments
Closed

Perma-diff due to new location_policy being repeatedly unset #1478

jawnsy opened this issue Nov 24, 2022 · 6 comments
Labels
bug Something isn't working Stale

Comments

@jawnsy
Copy link

jawnsy commented Nov 24, 2022

TL;DR

The new autoscaling location_policy setting is set to null to avoid issues with pre-1.24 clusters, however, this results in a permadiff due to the provider repeatedly changing ANY to null

Expected behavior

I expected upgrading module versions without modifying my infrastructure not to require any changes to infrastructure, or if it required changes, then I expected it to apply the change as a one-time upgrade migration.

It would be nice if the module could supply the appropriate location policy according to the control plane version (for post-1.24 clusters, supply ANY, otherwise, supply null), but this logic may belong in the provider instead.

Observed behavior

This may be an issue with the provider, but the symptom of this is that applying changes results in the location_policy being repeatedly reset to the default, which is ANY:

  ~ resource "google_container_node_pool" "pools" {
        name                        = "pool-69f6"
        # (10 unchanged attributes hidden)

      ~ autoscaling {
          - location_policy      = "ANY" -> null
            # (4 unchanged attributes hidden)
        }

        # (5 unchanged blocks hidden)
    }

This is a perma-diff that #1452 was trying to fix.

Terraform Configuration

n/a

Terraform Version

Terraform v1.3.4
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v4.42.0
+ provider registry.terraform.io/hashicorp/google-beta v4.44.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.16.0
+ provider registry.terraform.io/hashicorp/random v3.4.3

Additional information

The workaround/solution is for users to explicitly set a location_policy to ANY if they are using a post-1.24 cluster, or null otherwise.

@jawnsy jawnsy added the bug Something isn't working label Nov 24, 2022
@marcleibold
Copy link

Hi @jawnsy ,

I have the same problem, just that mine changes away from BALANCED the whole time.

~ resource "google_container_node_pool" "pools" {
        name                        = "default-node-pool"
        # (10 unchanged attributes hidden)

      ~ autoscaling {
          - location_policy      = "BALANCED" -> null
            # (4 unchanged attributes hidden)
        }

        # (5 unchanged blocks hidden)
    }

Do you know where exactly to set this location_policy for the workaround? To resolve the permadiff locally

@marcleibold
Copy link

Nevermind, I just tried to set it in the module definition via:

cluster_autoscaling = {
    enabled             = false
    autoscaling_profile = null
    min_cpu_cores       = 0
    max_cpu_cores       = 0
    min_memory_gb       = 0
    max_memory_gb       = 0
    gpu_resources       = []
}

But this line prevents the module from picking up the change. That workaround doesn't work.

I also tried setting it to something like "", but then I just get the following error message:

╷
│ Error: expected cluster_autoscaling.0.autoscaling_profile to be one of [BALANCED OPTIMIZE_UTILIZATION], got 
│ 
│   with module.clickhouse.module.gke.module.gke.google_container_cluster.primary,
│   on .terraform/modules/clickhouse.gke.gke/modules/beta-private-cluster/cluster.tf line 107, in resource "google_container_cluster" "primary":107:     autoscaling_profile = var.cluster_autoscaling.autoscaling_profile != null ? var.cluster_autoscaling.autoscaling_profile : "BALANCED"
│ 
╵

So this needs to be changed in the source code first if I am not completely mistaken.
If I am, feel free to correct me

@jawnsy
Copy link
Author

jawnsy commented Dec 8, 2022

@marcleibold Here's what I'm using for my node pool setting:

node_pools = [
  {
    name               = "pool"
    preemptible        = false
    spot               = false
    enable_secure_boot = true
    enable_gcfs        = true
    machine_type       = "t2d-standard-16"
    initial_node_count = 1
    min_count          = 0
    max_count          = 10
    max_surge          = 4
    image_type         = "COS_CONTAINERD"
    location_policy    = "ANY"
  },
]

You just have to add location_policy to your node pool config (ANY or BALANCED). By default, location_policy is null but Google Cloud will set it to something for post-1.24 clusters, which results in the perma-diff. The setting is a node pool setting, not a cluster setting: https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler#location_policy

@bharathkkb
Copy link
Member

Thanks for the report @jawnsy
This looks like a provider bug. IIUC by defaulting location_policy to null, it should behave as if location_policy was not set at all and managed by the provider.

@gleichda
Copy link

This was fixed in GoogleCloudPlatform/magic-modules#6982

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the Stale label Feb 19, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants