Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to upgrade AKS cluster using terraform module - Minor version of node pool version 27 is bigger than control plane version #465

Open
1 task done
dunefro opened this issue Nov 1, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@dunefro
Copy link

dunefro commented Nov 1, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Terraform Version

1.6.2

Module Version

7.4.0

AzureRM Provider Version

3.69.0

Affected Resource(s)/Data Source(s)

module.aks.azurerm_kubernetes_cluster.main

Terraform Configuration Files

resource "azurerm_user_assigned_identity" "cluster" {
  location            = var.location
  name                = var.name
  resource_group_name = var.resource_group_name
}

resource "azurerm_role_assignment" "network_contributor_cluster" {
  scope                = var.vnet_id
  role_definition_name = "Network Contributor"
  principal_id         = azurerm_user_assigned_identity.cluster.principal_id
}

module "aks" {
  source                    = "Azure/aks/azurerm"
  version                   = "7.4.0"
  resource_group_name       = var.resource_group_name
  cluster_name              = var.name
  location                  = var.location
  prefix                    = "prefix"
  workload_identity_enabled = var.workload_identity_enabled

  # agent configuration
  # agents_availability_zones = []
  agents_count         = local.intial_node_pool_min_count
  agents_max_count     = local.intial_node_pool_max_count
  agents_min_count     = local.intial_node_pool_min_count
  agents_pool_name     = "initial"
  agents_size          = var.intial_node_pool_instance_type
  agents_tags          = local.tags
  orchestrator_version = var.kubernetes_version

  # autoscaler configuration
  auto_scaler_profile_enabled                          = true
  auto_scaler_profile_expander                         = "random"
  auto_scaler_profile_max_graceful_termination_sec     = "180"
  auto_scaler_profile_max_node_provisioning_time       = "5m"
  auto_scaler_profile_max_unready_nodes                = 0
  auto_scaler_profile_scale_down_delay_after_add       = "2m"
  auto_scaler_profile_scale_down_delay_after_delete    = "30s"
  auto_scaler_profile_scale_down_unneeded              = "1m"
  auto_scaler_profile_scale_down_unready               = "2m"
  auto_scaler_profile_scale_down_utilization_threshold = "0.3"

  # cluster level configurations
  api_server_authorized_ip_ranges            = var.allowed_ip_ranges
  create_role_assignment_network_contributor = false
  enable_auto_scaling                        = true
  enable_host_encryption                     = true
  identity_ids                               = [azurerm_user_assigned_identity.cluster.id]
  identity_type                              = "UserAssigned"
  kubernetes_version                         = var.kubernetes_version

  # storage
  storage_profile_blob_driver_enabled         = var.enable_blob_driver
  storage_profile_disk_driver_enabled         = var.enable_disk_driver
  storage_profile_disk_driver_version         = var.disk_driver_version
  storage_profile_file_driver_enabled         = var.enable_file_driver
  storage_profile_snapshot_controller_enabled = var.enable_snapshot_controller
  storage_profile_enabled                     = var.enable_storage_profile

  # network configuration
  network_plugin             = var.network_plugin
  vnet_subnet_id             = var.subnet_id
  net_profile_dns_service_ip = var.dns_ip
  net_profile_service_cidr   = var.service_cidr
  net_profile_pod_cidr       = var.pod_cidr
  # net_profile_docker_bridge_cidr = "10.244.0.10"

  node_pools = local.node_pools

  oidc_issuer_enabled = var.oidc_issuer_enabled
  os_disk_size_gb     = var.disk_size

  # makes the initial node pool have a taint `CriticalAddonsOnly=true:NoSchedule`
  # helpful in scheduling important workloads 
  only_critical_addons_enabled = true

  private_cluster_enabled = var.private_cluster_enabled

  # rbac 
  rbac_aad                          = false
  role_based_access_control_enabled = false

  tags = local.tags
}

tfvars variables values

################################################################################
# Cluster
################################################################################

variable "name" {
  description = "Name of the cluster"
  type        = string
}
variable "kubernetes_version" {
  description = "Version of the kubernetes engine"
  default     = "1.27"
  type        = string
}

variable "oidc_issuer_enabled" {
  description = "Enable OIDC for the cluster"
  default     = true
  type        = bool
}

variable "disk_size" {
  description = "Disk size of the initial node pool in GB"
  default     = "100"
  type        = string
}

variable "intial_node_pool_instance_type" {
  description = "Instance size of the initial node pool"
  default     = "Standard_D2s_v5"
  type        = string
}

variable "intial_node_pool_spot_instance_type" {
  description = "Instance size of the initial node pool"
  default     = "Standard_D4s_v5"
  type        = string
}

variable "workload_identity_enabled" {
  description = "Enable workload identity in the cluster"
  default     = true
  type        = bool
}

variable "control_plane" {
  description = "Whether the cluster is control plane"
  type        = bool

}

variable "enable_storage_profile" {
  description = "Enable storage profile for the cluster. If disabled `enable_blob_driver`, `enable_file_driver`, `enable_disk_driver` and `enable_snapshot_controller` will have no impact"
  type        = bool
  default     = true
}

variable "enable_blob_driver" {
  description = "Enable blob storage provider"
  type        = bool
  default     = true
}

variable "enable_file_driver" {
  description = "Enable file storage provider"
  type        = bool
  default     = true
}

variable "enable_disk_driver" {
  description = "Enable disk storage provider"
  type        = bool
  default     = true
}

variable "disk_driver_version" {
  description = "Version of disk driver. Supported values `v1` and `v2`"
  type        = string
  default     = "v1"
}
variable "enable_snapshot_controller" {
  description = "Enable snapshot controller"
  type        = bool
  default     = true
}
################################################################################
# Network
################################################################################

variable "vnet_id" {
  description = "Vnet ID for the cluster"
  type        = string
}

variable "subnet_id" {
  description = "Subnet Id for the cluster"
  type        = string
}

variable "network_plugin" {
  description = "Network plugin to use for cluster"
  type        = string
  default     = "kubenet"
}

variable "pod_cidr" {
  description = "CIDR of the pod in cluster"
  default     = "10.244.0.0/16"
  type        = string
}

variable "service_cidr" {
  description = "CIDR of the services in cluster"
  default     = "10.255.0.0/16"
  type        = string
}

variable "dns_ip" {
  description = "IP from service CIDR used for internal DNS"
  default     = "10.255.0.10"
  type        = string
}

variable "allowed_ip_ranges" {
  description = "Allowed IP ranges to connect to the cluster"
  default     = ["0.0.0.0/0"]
  type        = list(string)
}

variable "private_cluster_enabled" {
  description = "Private cluster"
  default     = false
  type        = bool
}

################################################################################
# Generic
################################################################################

variable "resource_group_name" {
  description = "Name of the resource group"
  type        = string
}

variable "location" {
  description = "Location of the resource group"
  type        = string
}

variable "tags" {
  description = "A map of tags to add to all resources"
  type        = map(string)
  default     = {}
}


# locals.tf
locals {
  tags = merge(
    {
      "terraform-module" = "terraform-azure-cluster"
      "terraform"        = "true"
      "cluster-name"     = var.name
    },
    var.tags
  )
  intial_node_pool_min_count = var.control_plane ? 2 : 1
  intial_node_pool_max_count = var.control_plane ? 3 : 2
  node_pools = {
    spot = {
      name            = "spotpool"
      node_count      = 1
      max_count       = 20
      min_count       = 1
      os_disk_size_gb = 100
      priority        = "Spot"
      vm_size         = var.intial_node_pool_spot_instance_type

      # mandatory to pass otherwise node pool will be recreated
      enable_auto_scaling     = true
      custom_ca_trust_enabled = false
      enable_host_encryption  = false
      enable_node_public_ip   = false
      eviction_policy         = "Delete"
      orchestrator_version    = var.kubernetes_version
      node_taints = [
        "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
      ]
      tags           = local.tags
      zones          = []
      vnet_subnet_id = var.subnet_id
    }
  }
}

Debug Output/Panic Output

│ Error: updating Default Node Pool Agent Pool (Subscription: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
│ Resource Group Name: "tfy-az-prod"
│ Managed Cluster Name: "az-prod-eaus"
│ Agent Pool Name: "initial") performing CreateOrUpdate: agentpools.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="NodePoolMcVersionIncompatible" Message="Node pool version 1.27.3 and control plane version 1.26.6 are incompatible. Minor version of node pool version 27 is bigger than control plane version 26. For more information, please check https://aka.ms/aks/UpgradeVersionRules"
│ 
│   with module.aks.azurerm_kubernetes_cluster.main,
│   on .terraform/modules/aks/main.tf line 17, in resource "azurerm_kubernetes_cluster" "main":
│   17: resource "azurerm_kubernetes_cluster" "main" {

Expected Behaviour

Ideally the control plane should first gets updated and then the node pool. To resolve this issue I have to first update the control plane from the portal and then update the node pool from terraform.

Actual Behaviour

There is no terraform diff for updating control plane and node pool upgrade causes version incompatibility error.

Steps to Reproduce

No response

Important Factoids

No response

References

No response

@dunefro dunefro added the bug Something isn't working label Nov 1, 2023
@mkilchhofer
Copy link
Contributor

mkilchhofer commented Jan 25, 2024

We at @swisspost have the exact same issue. I analyzed it with some of my colleagues (@hichem-belhocine will ping you @zioproto via MSteams).

We have no auto update mechanism from AKS side in place (Auto Upgrade Type = Disabled), we specify the exact patch version for both:

  • control-plane (var.kubernetes_version)
  • nodepools (var.orchestrator_version)

With TF module version 6.8.0 and the upgrade from 1.25 -> 1.26 a plan looked like this:
image

Everything went smooth. Then we upgraded to module version 7.5.0 and the upgrade is now triggered via an other TF provider (azapi) and the control plane variable is ignored via a lifecycle { } block. Ref:

A plan to 1.27 now looks like this (w/ module version 7.5.0):
image

And of course this will fail as you cannot update the nodepool to a newer version than the control-plane:

│ Error: updating Default Node Pool Agent Pool (Subscription: "<REDACTED>"
│ Resource Group Name: "rg-aks-dev-m98cn0001"
│ Managed Cluster Name: "aks-dev-m98cn0001"
│ Agent Pool Name: "system") performing CreateOrUpdate: agentpools.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="NodePoolMcVersionIncompatible" Message="Node pool version 1.27.7 and control plane version 1.26.10 are incompatible. Minor version of node pool version 27 is bigger than control plane version 26. For more information, please check https://aka.ms/aks/UpgradeVersionRules"
│ 
│   with module.aks_post.module.aks.azurerm_kubernetes_cluster.main,
│   on .terraform/modules/aks_post.aks/main.tf line 17, in resource "azurerm_kubernetes_cluster" "main":
│   17: resource "azurerm_kubernetes_cluster" "main" {
│ 

@zioproto
Copy link
Collaborator

@dunefro I understand you are using var.orchestrator_version = var.kubernetes_version

I see the problem where after PR #336 the cluster update is handled like this:

resource "azapi_update_resource" "aks_cluster_post_create" {
type = "Microsoft.ContainerService/managedClusters@2023-01-02-preview"
body = jsonencode({
properties = {
kubernetesVersion = var.kubernetes_version
}
})
resource_id = azurerm_kubernetes_cluster.main.id
lifecycle {
ignore_changes = all
replace_triggered_by = [null_resource.kubernetes_version_keeper.id]
}
}

so the Terraform plan will attempt to update just the nodepools, because var.orchestrator_version is not ignored in the lifecycle.

lifecycle {
ignore_changes = [
http_application_routing_enabled,
http_proxy_config[0].no_proxy,
kubernetes_version,
public_network_access_enabled,
]

@dunefro to unblock your work you could set the following values:

kubernetes_version = 1.27
orchestrator_version =1.26

This will update the control plane first to the latest 1.27 patch version. Once Terraform apply completes you can update your values to:

kubernetes_version = 1.27
orchestrator_version =1.27

This will finish the update updating also the node pools.

@dunefro please confirm if this unblocks your work and you can upgrade successfully.

@lonegunmanb @nellyk for the long term solution I see 2 options:

  1. Make very visible in the documentation of the module that kubernetes_version and orchestrator_version are 2 distinct variables for the reason that first it is necessary to upgrade the control plane and then the node pools.

  2. Support the scenario where the user configures kubernetes_version = orchestrator_version during an upgrade. In this case we should add to the lifecycle ignore_changes also the var.orchestrator_version both in the azurerm_kubernetes_cluster_node_pool (extra node pools) and azurerm_kubernetes_cluster (system node pool) and handle those upgrades with a new azapi resource.

Please everyone give feedback on what is the best option for you.

@zioproto
Copy link
Collaborator

TODO:

test if the following code:

resource "azapi_update_resource" "aks_cluster_post_create" {
type = "Microsoft.ContainerService/managedClusters@2023-01-02-preview"
body = jsonencode({
properties = {
kubernetesVersion = var.kubernetes_version
}
})
resource_id = azurerm_kubernetes_cluster.main.id
lifecycle {
ignore_changes = all
replace_triggered_by = [null_resource.kubernetes_version_keeper.id]
}
}

will upgrade also the System node pool to var.kubernetes_version regardless of the value of var.orchestrator_version.

@mkilchhofer
Copy link
Contributor

Please everyone give feedback on what is the best option for you.

I am not a fan of the approach to upgrade everything with the Azapi provider if we do hard pinning and disable autopatch.

The approach of azapi has several caveats, the main one is the non-idempotent behavior.
If we rollout only the control plane, nothing seems to happen in the first run. The null_resource with the "version keeper" is updated, but it doesn't trigger the update in the first run. Then if you trigger another run without changing any code, the upgrade finally starts. IMHO non-idempotent behavior is a no-no in the Terraform world.

I was happy with the available approach of 6.8.

Maybe we need two things:

  • behavior of 6.8 for no autopatch
  • behavior of 7.5 incase autopatch is active

@zioproto
Copy link
Collaborator

Maybe we need two things:

  • behavior of 6.8 for no autopatch
  • behavior of 7.5 incase autopatch is active

The challenge is that we cannot have a conditional lifecycle block that depends on the value of var.automatic_channel_upgrade.

We would have to create 2 independent azurerm_kubernetes_cluster resources and create only 1 conditionally. This code would be very hard to maintain because every change must be duplicated to all the azurerm_kubernetes_cluster resources.

@zioproto
Copy link
Collaborator

Next step:

  • @zioproto to provide a minimal repro example for the team.

@zioproto
Copy link
Collaborator

The approach of azapi has several caveats, the main one is the non-idempotent behavior. If we rollout only the control plane, nothing seems to happen in the first run. The null_resource with the "version keeper" is updated, but it doesn't trigger the update in the first run. Then if you trigger another run without changing any code, the upgrade finally starts. IMHO non-idempotent behavior is a no-no in the Terraform world.

@mkilchhofer I am not able to reproduce the non-idempotent behavior. I created PR #501 where I extend one of our current CI tests to run the upgrades. As I soon as I change the value of var.kubernetes_version the upgrade of the control plane is triggered. Could you please test ? Or could you please provide the exact steps to reproduce the non-idempotent behavior ? Thanks

@mkilchhofer
Copy link
Contributor

Could you please test ? Or could you please provide the exact steps to reproduce the non-idempotent behavior ? Thanks

Can do some more testing but I am now one week on vacation (⛷️🎿🏔️) . I am back in the office on February 5th.

@dunefro
Copy link
Author

dunefro commented Jan 30, 2024

please confirm if this unblocks your work and you can upgrade successfully.

Yes this works. Thanks @zioproto

@propyless
Copy link

propyless commented Feb 21, 2024

@dunefro I understand you are using var.orchestrator_version = var.kubernetes_version

I see the problem where after PR #336 the cluster update is handled like this:

resource "azapi_update_resource" "aks_cluster_post_create" {
type = "Microsoft.ContainerService/managedClusters@2023-01-02-preview"
body = jsonencode({
properties = {
kubernetesVersion = var.kubernetes_version
}
})
resource_id = azurerm_kubernetes_cluster.main.id
lifecycle {
ignore_changes = all
replace_triggered_by = [null_resource.kubernetes_version_keeper.id]
}
}

so the Terraform plan will attempt to update just the nodepools, because var.orchestrator_version is not ignored in the lifecycle.

lifecycle {
ignore_changes = [
http_application_routing_enabled,
http_proxy_config[0].no_proxy,
kubernetes_version,
public_network_access_enabled,
]

@dunefro to unblock your work you could set the following values:

kubernetes_version = 1.27
orchestrator_version =1.26

This will update the control plane first to the latest 1.27 patch version. Once Terraform apply completes you can update your values to:

kubernetes_version = 1.27
orchestrator_version =1.27

This will finish the update updating also the node pools.

@dunefro please confirm if this unblocks your work and you can upgrade successfully.

@lonegunmanb @nellyk for the long term solution I see 2 options:

  1. Make very visible in the documentation of the module that kubernetes_version and orchestrator_version are 2 distinct variables for the reason that first it is necessary to upgrade the control plane and then the node pools.
  2. Support the scenario where the user configures kubernetes_version = orchestrator_version during an upgrade. In this case we should add to the lifecycle ignore_changes also the var.orchestrator_version both in the azurerm_kubernetes_cluster_node_pool (extra node pools) and azurerm_kubernetes_cluster (system node pool) and handle those upgrades with a new azapi resource.

Please everyone give feedback on what is the best option for you.

Hello zioproto. I think it makes sense from a user perspective with both WoW's. Since typically you do upgrade first the control plane before the workers.

For a usability PoV I prefer having to being able to set both versions to the version I want and the module to handle upgrade order. As it once worked in 6.8.0.

However today, there is no documentation over the modules dropped support for handling a seamless upgrade of both orchestrator and worker planes, causing this confusion.

I think it makes sense to be able to upgrade some node pools to a newer version, while some maybe can stay behind a version.

For us who are using the module, just the added documentation and the intended way to upgrade after the 6.8 to 7.5 upgrade is enough for us to be satisfied.

Edit: For us who use PR's to to peer review each others changes, this means we need to do two changes.. one for the controller plane, the second for, node pools... which is a bit more of a hassle.

@zioproto
Copy link
Collaborator

Could you please test ? Or could you please provide the exact steps to reproduce the non-idempotent behavior ? Thanks

Can do some more testing but I am now one week on vacation (⛷️🎿🏔️) . I am back in the office on February 5th.

@mkilchhofer friendly ping about this pending issue :) thanks

@fleetwoodstack
Copy link

@mkilchhofer friendly ping about this issue :)

@mkilchhofer
Copy link
Contributor

mkilchhofer commented May 30, 2024

@mkilchhofer friendly ping about this issue :)

I cannot test it anymore. We no longer use the module and manage the required TF resources ourself.

/cc: @zioproto

@fleetwoodstack
Copy link

Can anyone else test it?

@dunefro
Copy link
Author

dunefro commented Jun 11, 2024

I was testing out the entire setup once again with the current version being 1.28 and the new_version being 1.29.
As mentioned by @propyless I started out with updating the variable kubernetes_version first to 1.29.

Somehow I am not able to detect any drift for upgrading the kubernetes version. If I try to use orchestrator_version = 1.29, there is a drift detection but only for the node pools, not the control plane. This is my AKS terraform file.

I have also tried to upgrade to AKS module version 9.0.0 and azurerm version 3.107.0, still the same results.

One more weird things i have observed in the drift is the node count of the Azure AKS node pool. If I am not specifying the node count, rather specifying the min_count and max_count for the node pool, upon each terraform apply, even if there is no manual change in terraform from my side, the terraform detects a drift to reduce the size of the node pool back to 0. This should not happen if I am using cluster_autoscaler.

It would be great if we can get some help in the official documentation on -

  • how to use the node pool
  • how to perform k8s upgrades

they should cover various scenarios where node pools are managed by cluster_autoscaler or not.

@zioproto
Copy link
Collaborator

  • how to use the node pool

Would this example we already test in the CI help to demostrate how to use node pools ?

https://github.com/Azure/terraform-azurerm-aks/tree/main/examples/multiple_node_pools

@zioproto
Copy link
Collaborator

@lonegunmanb could you please double check this with me ? in the resource azurerm_kubernetes_cluster_node_pool we have a each.value.node_count value.

resource "azurerm_kubernetes_cluster_node_pool" "node_pool_create_before_destroy" {
for_each = local.node_pools_create_before_destroy
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
name = "${each.value.name}${substr(md5(uuid()), 0, 4)}"
vm_size = each.value.vm_size
capacity_reservation_group_id = each.value.capacity_reservation_group_id
custom_ca_trust_enabled = each.value.custom_ca_trust_enabled
enable_auto_scaling = each.value.enable_auto_scaling
enable_host_encryption = each.value.enable_host_encryption
enable_node_public_ip = each.value.enable_node_public_ip
eviction_policy = each.value.eviction_policy
fips_enabled = each.value.fips_enabled
gpu_instance = each.value.gpu_instance
host_group_id = each.value.host_group_id
kubelet_disk_type = each.value.kubelet_disk_type
max_count = each.value.max_count
max_pods = each.value.max_pods
message_of_the_day = each.value.message_of_the_day
min_count = each.value.min_count
mode = each.value.mode
node_count = each.value.node_count
node_labels = each.value.node_labels

we mark node_count as optional:

variable "node_pools" {
type = map(object({
name = string
node_count = optional(number)
tags = optional(map(string))
vm_size = string

But in the example we have node_count = 1.

locals {
nodes = {
for i in range(3) : "worker${i}" => {
name = substr("worker${i}${random_id.prefix.hex}", 0, 8)
vm_size = "Standard_D2s_v3"
node_count = 1
vnet_subnet_id = azurerm_subnet.test.id
create_before_destroy = i % 2 == 0
}
}
}

When cluster autoscaler is enabled node_count should be null. Is possible that we have an ambiguity here ? I am afraid the value will be 0 instead of null unless explicitly set.

For the system node pool the variable is called agents_count and it must be explicitly set to null:

https://github.com/Azure/terraform-azurerm-aks/blob/2c364a6d206f6c546da70fb6b9bd0dd633a83764/variables.tf#L30-L34

@dunefro what happens if you set to null the node_count in the node_pools object ? Does it fix the state drift problem ?

@zioproto
Copy link
Collaborator

@dunefro To try to reproduce your issue I tested the example in this repo examples/multiple_node_pools with the following change

 resource "azurerm_virtual_network" "test" {
   address_space       = ["10.52.0.0/16"]
-  location            = local.resource_group.location
+  location            = "eastus"
   name                = "${random_id.prefix.hex}-vn"
   resource_group_name = local.resource_group.name
 }
@@ -36,7 +36,10 @@ locals {
     for i in range(3) : "worker${i}" => {
       name                  = substr("worker${i}${random_id.prefix.hex}", 0, 8)
       vm_size               = "Standard_D2s_v3"
-      node_count            = 1
+      min_count             = 1
+      max_count             = 3
+      enable_auto_scaling   = true
       vnet_subnet_id        = azurerm_subnet.test.id
       create_before_destroy = i % 2 == 0
     }

But I dont have Terraform state drift , and I can apply "terraform apply" over and over again without detecting any change.

Could you please provide a code example that shows the problem of terraform trying to force the nodes to 0 ? Thanks

@dunefro
Copy link
Author

dunefro commented Jul 1, 2024

what happens if you set to null the node_count in the node_pools object ? Does it fix the state drift problem ?

@zioproto Yes this gets solved when I set the node_count to null, terraform doesn't detect any drift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

5 participants