-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terraform should destroy and create resources if there is change in settings of aks cluster #389
Comments
Hello @nayaksuraj can you please confirm you are using the same |
Hi @zioproto , Yes, I am using same tf state file. I tried several times with the main and 7.0.0 tags the result is the same. |
Hello, I am not able to reproduce this problem with the information you provided. In the error Terraform complains about a cluster called However in your Terraform configuration files shared above you have:
This will create a cluster called Also the error refers to a resource group Could you please check the data shared and provide a minimal example to reproduce the problem ? Thanks |
Hi @nayaksuraj , module "aks" {
source = "../.."
prefix = "prefix-${random_id.prefix.hex}"
resource_group_name = local.resource_group.name
os_disk_size_gb = 50
public_network_access_enabled = true
sku_tier = "Standard"
rbac_aad = false
vnet_subnet_id = azurerm_subnet.test.id
node_pools = {}
agents_pool_name = "np"
} Your According to module's implementation, when
It looks like you've set |
I'm having the same error. Should prefix variable contain random characters in order for cluster to be replaced? |
Hi @avekrivoy would you please give us a minimum example that could reproduce your issue so we can try? Thanks! |
I'm using terraform 1.5.4 (tried on 1.4.6 at first as well).
If I change
Please let me know if you need more details. Don't really want to paste everything here, since my terraform configuration contains a lot of variables |
@avekrivoy could you please share your provider block ? You just have:
Or you have any additional feature configuration ? |
I don't have any additional features enabled
|
@lonegunmanb @zioproto do you need any additional information? |
Thanks @avekrivoy , I can reproduce this issue now. |
I think I've figured out the reason. resource "azurerm_kubernetes_cluster_node_pool" "node_pool" {
for_each = var.node_pools
...
lifecycle {
create_before_destroy = true
terraform {
required_providers {
azurerm = {
version = ">3.56.0"
}
}
}
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "example" {
name = "example-aks-${random_pet.suffix.id}"
location = "West Europe"
}
resource "random_pet" "suffix" {}
resource "azurerm_kubernetes_cluster" "example" {
name = "example-aks-${random_pet.suffix.id}"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
} After the apply, we change
Please notice But if we added a resource that declared terraform {
required_providers {
azurerm = {
version = ">3.56.0"
}
}
}
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "example" {
name = "zjhe-aks"
location = "West Europe"
}
resource "random_pet" "suffix" {}
resource "azurerm_kubernetes_cluster" "example" {
name = "zjhe-aks-${random_pet.suffix.id}"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
}
resource "null_resource" "test" {
count = 0
triggers = {
input = azurerm_kubernetes_cluster.example.name
}
lifecycle {
create_before_destroy = true
}
} Please notice that
This time Terraform would try to create the replacement resource first, then destroy the deprecated resource, and that caused the error. I don't think it is a Terraform Core's bug since once a downstream resource is For now, the only solution I can provide is adding a random string as aks cluster's name as we've done for resource "azurerm_kubernetes_cluster_node_pool" "node_pool" {
for_each = var.node_pools
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
name = "${each.value.name}${substr(md5(jsonencode(each.value)), 0, 4)}" But that would be considered as a breaking change, so I will provide a feature toggle variable so the module's caller could decide whether to use this random name suffix or not. |
So this random suffix should be generated inside the module? Would it help if I passed it outside of the module?
|
Nope, didn't work. Just checked |
Hi @avekrivoy could you try the following approach?: resource "null_resource" "name_keeper" {
triggers = {
network_policy = var.network_policy
}
}
resource "random_id" "aks_prefix" {
byte_length = 8
lifecycle {
replace_triggered_by = [null_resource.name_keeper.id]
}
}
locals {
aks_prefix = "${var.customer}-${var.environment}-${random_id.aks_prefix.hex}"
} I've tried different ways in the module but none of them could be a non-breaking change. I think maybe I have to defer this patch to our next major version upgrade, it could be months later, so for now it's better to workaround outside this module. |
Well, looks like it might work, since a new unique id will be generated on changes. But the thing is that network policy is not the only parameter that will trigger cluster replacement. And keeping all parameters in a null_resource as triggers is quite messy in my opinion. |
@lonegunmanb what about making lifecycle create_before_destroy setting for the node pool optional? Line 658 in 0f07682
something like this
could this potentially work? |
The reason why we're facing this issue is, the Aks resource became I understand you'd like to add a toggle variable for this argument so the module caller could decide it's destroy behavior, but unfortunately For now, I don't see any non-breaking way to solve this issue (we need at least a new |
Why was this meta-argument added in a first place? To upgrade node pool without downtime of apps that are running in that pool? |
Yes exactly, once we want to change vm size for a node pool we can provision a new pool first, then delete the old pool, and this deletion would trigger eviction for all pods running on the old pool, and launch them in the new pool. |
I'm closing this issue since I think it would be solved by Since Terraform core would treat the aks cluster resource as |
Is there an existing issue for this?
Greenfield/Brownfield provisioning
brownfield
Terraform Version
1.4.6
Module Version
7.1.0
AzureRM Provider Version
3.58.0
Affected Resource(s)/Data Source(s)
azurerm_kubernetes_cluster
Terraform Configuration Files
tfvars variables values
Debug Output/Panic Output
Expected Behaviour
It should first delete the cluster and create. Now it's trying to create the cluster.
Actual Behaviour
No response
Steps to Reproduce
Important Factoids
No response
References
No response
The text was updated successfully, but these errors were encountered: