sysctl change does not actually get applied #18208

christhegrand · 2024-05-21T20:26:21Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.8.2
on darwin_arm64

provider registry.terraform.io/hashicorp/google v5.22.0
provider registry.terraform.io/hashicorp/google-beta v5.27.0

Your version of Terraform is out of date! The latest version
is 1.8.3. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

google_container_cluster

Terraform Configuration

# See https://www.terraform.io/docs/providers/google/r/container_cluster.html
resource "google_container_cluster" "primary" {
  # https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_versions#using-the-google-beta-provider
  provider       = google-beta
  name           = "${var.cluster-name}-cluster"
  location       = var.location
  node_locations = var.node-locations

  timeouts {
    read   = "60m"
    create = "120m"
    delete = "120m"
    update = "120m"
  }

  // I think we should enable this everywhere, but at the moment I put this behind
  // a variable because some clusters have it enabled and some don't, and I'm just
  // trying to perform a TF upgrade here.
  enable_shielded_nodes = var.shielded-nodes-enabled

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  # Make the cluster VPC-native instead of routes-based
  # See https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips
  ip_allocation_policy {
  }

  # This, confusingly enough, is actually the configuration for auto-provisioned
  # node pools. We want this for the ASR pods that use a GPU.
  # See documentation in
  # https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning,
  # https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler,
  # and https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler.
  cluster_autoscaling {
    enabled = true

    autoscaling_profile = var.optimize-utilization ? "OPTIMIZE_UTILIZATION" : "BALANCED"

    // The cluster autoscaler can create nodes for use by GPU-enabled pods, and
    // pods that do not need GPUs. Be very careful adding any changes here that
    // should affect only one of those groups of pods.
    dynamic "resource_limits" {
      for_each = var.cluster-resource-limits
      content {
        resource_type = resource_limits.value["resource_type"]
        minimum       = resource_limits.value["minimum"]
        maximum       = resource_limits.value["maximum"]
      }
    }

    auto_provisioning_defaults {
      oauth_scopes = [
        "https://www.googleapis.com/auth/logging.write",
        "https://www.googleapis.com/auth/monitoring",
        "https://www.googleapis.com/auth/devstorage.read_only",
        "https://www.googleapis.com/auth/compute"
      ]

      management {
        auto_repair  = "true"
        auto_upgrade = "true"
      }

      disk_size  = var.disk-size-gb
      image_type = "COS_CONTAINERD"
    }
  }

  # Setting an empty username and password explicitly disables basic auth
  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  node_pool_defaults {
    node_config_defaults {
      gcfs_config {
        enabled = var.is-image-streaming-enabled
      }
    }
  }

  node_config {
    linux_node_config {
      sysctls = {
        "net.core.somaxconn" = "4096"
      }
    }
  }
}

Debug Output

2024-05-21T13:17:04.915-0700 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/google-beta\"]" produced an unexpected new value for module.cluster.google_container_cluster.primary, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .node_config[0].linux_node_config: block count changed from 1 to 0

Expected Behavior

When I run terraform apply with this change, Terraform reports that the change has been applied successfully. When I run terraform apply again, I should see that there are no more changes to apply.

Actual Behavior

When I run terraform apply with this change, Terraform reports that the change has been applied successfully. But when I run terraform apply again, I see the sysctl change show up as an unapplied change. Basically, Terraform thinks the change has been applied successfully, but it does not seem like it is actually persisting.

Steps to reproduce

terraform apply

Important Factoids

No response

References

No response

b/342657392

b/343052499

The text was updated successfully, but these errors were encountered:

ggtisc · 2024-05-23T21:39:18Z

Hi @christhegrand

As I understand your issue is that after creating the resource "google_container_cluster" "primary" the next property wasn't applied:

  node_config {
    linux_node_config {
      sysctls = {
        "net.core.somaxconn" = "4096"
      }
    }
  }

And the unique step to reproduce this issue is to run a terraform apply to create this resource. Please confirm if that is right.

Finally please send us the complete LOG.DEBUG output and the terraform.tfstate result after the resource creation to confirm what is happening in your environment.

christhegrand · 2024-05-23T22:38:48Z

As I understand your issue is that after creating the resource "google_container_cluster" "primary" the next property wasn't applied:

node_config {
linux_node_config {
sysctls = {
"net.core.somaxconn" = "4096"
}
}
}
And the unique step to reproduce this issue is to run a terraform apply to create this resource. Please confirm if that is right.

Yes, that is correct.

christhegrand · 2024-05-23T22:40:04Z

I added a Github gist with files that show the output of terraform apply with debug logging turned on, together with the Terraform state.

https://gist.github.com/christhegrand/8e1bf8de6e842298a762ca71c4ae1462

ggtisc · 2024-05-27T16:38:18Z

Possible bug in update, it looks like node_config only has one updatable field, and linux_node_config has not been implemented for updates.

lornaluo · 2024-06-06T07:52:17Z

Instead of using node_config in resource google_container_cluster :

resource "google_container_cluster" "primary" {
  ... 
  node_config {
    linux_node_config {
      sysctls = {
        "net.core.somaxconn" = "4096"
      }
    }
  }
 ...
}

You can try with node_config in resource google_container_node_pool to config linux node:

# Separately Managed Node Pool
resource "google_container_node_pool" "primary_nodes" {
... 
  node_config {
    ... 
    linux_node_config {
      sysctls = {
        "net.core.somaxconn" = "4096"
      }
    }
  .... 
}

I tried apply these setting with terraform and it works.

lornaluo · 2024-06-07T01:20:50Z

Adde some reference for this.

In https://registry.terraform.io/providers/hashicorp/google/5.32.0/docs/resources/container_cluster#argument-reference
It mentions google_container_cluster.node_config is not recommended with terraform

Recommended to create with a separately managed node pool (recommended)

christhegrand · 2024-06-07T16:21:09Z

Thanks. We are trying to add node config defaults to autoprovisioned node pools. Is it not possible to do that using Terraform?

lornaluo · 2024-06-08T00:30:18Z

Maybe you can try first enable node auto-provisioning on the cluster with Terraform , then specify which node pools are auto-provisioned:

https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#mark_node_auto-provisioned

mauriciopoppe · 2024-06-10T21:43:34Z

As mentioned in terraform-google-modules/terraform-google-kubernetes-engine#15 the default nodepool causes trouble with managing the cluster. Personally I always delete the default nodepool and create additional nodepools and manage their lifecycle outside the lifecycle of the cluster resource.

@christhegrand can you update your cluster provisioning script to create additional nodepool resources and delete the default nodepool?

christhegrand · 2024-06-10T21:50:02Z

I did delete the default node pool:

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true

Maybe you can try first enable node auto-provisioning on the cluster with Terraform , then specify which node pools are auto-provisioned:

https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#mark_node_auto-provisioned

That's interesting. I didn't realize I could create a node pool and then enable auto-provisioning on it.

wangzhen127 · 2024-06-17T21:26:08Z

@christhegrand Are you able to make changes to the additional node pools (not using the cluster level node_config field)?

lornaluo · 2024-07-01T18:32:14Z

@christhegrand I have created a bug internally for GKE to deprecate the cluster level field node_config.
Could we close this one if there are no other other issues ?

christhegrand · 2024-07-01T18:41:30Z

Sure. Thank you!

christhegrand added the bug label May 21, 2024

github-actions bot added forward/review In review; remove label to forward service/container labels May 21, 2024

ggtisc self-assigned this May 23, 2024

ggtisc added the waiting-response label May 23, 2024

github-actions bot removed the waiting-response label May 23, 2024

trodge removed the forward/review In review; remove label to forward label May 24, 2024

modular-magician added the forward/linked label May 24, 2024

ggtisc removed their assignment May 27, 2024

ggtisc removed the forward/linked label May 27, 2024

modular-magician added the forward/linked label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sysctl change does not actually get applied #18208

sysctl change does not actually get applied #18208

christhegrand commented May 21, 2024 •

edited by modular-magician

Loading

ggtisc commented May 23, 2024

christhegrand commented May 23, 2024 •

edited

Loading

christhegrand commented May 23, 2024 •

edited

Loading

ggtisc commented May 27, 2024

lornaluo commented Jun 6, 2024

lornaluo commented Jun 7, 2024

christhegrand commented Jun 7, 2024

lornaluo commented Jun 8, 2024

mauriciopoppe commented Jun 10, 2024

christhegrand commented Jun 10, 2024

wangzhen127 commented Jun 17, 2024

lornaluo commented Jul 1, 2024

christhegrand commented Jul 1, 2024

sysctl change does not actually get applied #18208

sysctl change does not actually get applied #18208

Comments

christhegrand commented May 21, 2024 • edited by modular-magician Loading

Community Note

Terraform Version & Provider Version(s)

Affected Resource(s)

Terraform Configuration

Debug Output

Expected Behavior

Actual Behavior

Steps to reproduce

Important Factoids

References

ggtisc commented May 23, 2024

christhegrand commented May 23, 2024 • edited Loading

christhegrand commented May 23, 2024 • edited Loading

ggtisc commented May 27, 2024

lornaluo commented Jun 6, 2024

lornaluo commented Jun 7, 2024

christhegrand commented Jun 7, 2024

lornaluo commented Jun 8, 2024

mauriciopoppe commented Jun 10, 2024

christhegrand commented Jun 10, 2024

wangzhen127 commented Jun 17, 2024

lornaluo commented Jul 1, 2024

christhegrand commented Jul 1, 2024

christhegrand commented May 21, 2024 •

edited by modular-magician

Loading

christhegrand commented May 23, 2024 •

edited

Loading

christhegrand commented May 23, 2024 •

edited

Loading