Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Evaluate aws_iam_role_policy_attachment.this conditional before items #2396

Closed
wants to merge 1 commit into from

Conversation

demolitionmode
Copy link
Contributor

@demolitionmode demolitionmode commented Jan 13, 2023

Description

Currently the for_each in aws_iam_role_policy_attachment.this will cause Terraform to resolve each item in the set before evaluating whether it would use that set; this change postpones the resolution until after evaluating var.create and var.create_iam_instance_profile. This means that the items are allowed to be unresolvable, given that they are discarded immediately by the conditionals.

Motivation and Context

For example:

  • I want to use depends_on when calling this module to delay nodegroup creation until after deploying a CNI config dependency
  • This causes a an issue with the items not being known until apply time (error: The "for_each" value depends on resource attributes that cannot be determined until apply...)
  • Therefore, I have split my use of the module into two distinct blocks:
    • One module usage is responsible for creating the IAM role (which depends on the aws_partition data source, and does not need depends_on)
    • The second module usage is for the launch templates etc (this does need depends_on to delay creation)
  • Currently setting create_iam_instance_profile = false still relies on the aws_partition data source due to the evaluation order (this PR fixes that).
Example code
module "self_managed_node_group_iam_role" {
  source  = "terraform-aws-modules/eks/aws//modules/self-managed-node-group"
  version = ">= 19.5.1"

  # Only create the IAM Role
  create_autoscaling_group = false
  create_launch_template   = false
  create_schedule          = false
}

module "self_managed_node_group" {
  source  = "terraform-aws-modules/eks/aws//modules/self-managed-node-group"
  version = ">= 19.5.1"

  create_iam_instance_profile = false
  iam_instance_profile_arn    = module.self_managed_node_group_iam_role.iam_instance_profile_arn

  depends_on = [helm_release.aws_vpc_cni] # Must wait until VPC CNI config is deployed, otherwise we have to restart nodes
}

resource "helm_release" "aws_vpc_cni" {
  # Configure VPC CNI with ENIConfig that manipulates which subnets pods are created in
  # Only respected by nodes that are created after this configuration is deployed
  ...
}

Breaking Changes

No breaking changes

How Has This Been Tested?

  • I have updated at least one of the examples/* to demonstrate and validate my change(s)
  • I have tested and validated these changes using one or more of the provided examples/* projects
  • I have executed pre-commit run -a on my pull request

Testing has been done using the code snippet in the example above (prior to cluster creation; the intention is that this can be used to create a brand new cluster from scratch, fully automated)

Prior to fix
module "self_managed_node_group_iam_role" {
  source  = "terraform-aws-modules/eks/aws//modules/self-managed-node-group"
  version = ">= 19.5.1"

  ...
}

module "self_managed_node_group" {
  source  = "terraform-aws-modules/eks/aws//modules/self-managed-node-group"
  version = ">= 19.5.1"

  ...
}

Output:

╷
│ Error: Invalid for_each argument
│ 
│   on .terraform/modules/self_managed_node_group/modules/self-managed-node-group/main.tf line 740, in resource "aws_iam_role_policy_attachment" "this":
│  740:   for_each = { for k, v in toset(compact([
│  741:     "${local.iam_role_policy_prefix}/AmazonEKSWorkerNodePolicy",
│  742:     "${local.iam_role_policy_prefix}/AmazonEC2ContainerRegistryReadOnly",
│  743:     var.iam_role_attach_cni_policy ? local.cni_policy : "",
│  744:   ])) : k => v if var.create && var.create_iam_instance_profile }
│     ├────────────────
│     │ local.cni_policy is a string, known only after apply
│     │ local.iam_role_policy_prefix is a string, known only after apply
│     │ var.create is true
│     │ var.create_iam_instance_profile is false
│     │ var.iam_role_attach_cni_policy is true
│ 
│ The "for_each" map includes keys derived from resource attributes that
│ cannot be determined until apply, and so Terraform cannot determine the
│ full set of keys that will identify the instances of this resource.
│ 
│ When working with unknown values in for_each, it's better to define the map
│ keys statically in your configuration and place apply-time results only in
│ the map values.
│ 
│ Alternatively, you could use the -target planning option to first apply
│ only the resources that the for_each value depends on, and then apply a
│ second time to fully converge.
After fix
module "self_managed_node_group_iam_role" {
  source  = "github.com/demolitionmode/terraform-aws-eks//modules/self-managed-node-group"

  ...
}

module "self_managed_node_group" {
  source  = "github.com/demolitionmode/terraform-aws-eks//modules/self-managed-node-group"

  ...
}

Plan is successful

@demolitionmode demolitionmode changed the title bugfix: Evaluate aws_iam_role_policy_attachment.this conditional before items fix: Evaluate aws_iam_role_policy_attachment.this conditional before items Jan 13, 2023
@bryantbiggs
Copy link
Member

Currently the for_each in aws_iam_role_policy_attachment.this will cause Terraform to resolve each item in the set before evaluating whether it would use that set; this change postpones the resolution until after evaluating var.create and var.create_iam_instance_profile. This means that the items are allowed to be unresolvable, given that they are discarded immediately by the conditionals.

This is an incorrect understanding. The format used currently is correct - the format proposed is incorrect and will throw a type mis-match error

@demolitionmode
Copy link
Contributor Author

demolitionmode commented Jan 13, 2023

the format proposed is incorrect and will throw a type mis-match error

Based on this feedback I have run tests for various Terraform versions that are supported by this module, and cannot find an instance of a type mis-match error being thrown.

I've tested against the following versions, with the apply logs uploaded to https://github.com/demolitionmode/terraform-aws-eks/tree/testing/testing/results

  • 1.0.11
  • 1.1.9
  • 1.2.9
  • 1.3.7

I made use of Terragrunt and tfenv to switch versions and module sources to compare both master and my fork; the script to do so is test.sh (here), and the Terraform resources in the templated main.tf (here). The resources were simplified from the examples as creating/destroying a cluster each time would be time-consuming, but I'm happy to run those tests if there's something missing from these.

@bryantbiggs
Copy link
Member

The left side of the conditional is of type toset()/set and the right is of type tolist()/list

var.create && var.create_iam_instance_profile ? toset(compact([
    "${local.iam_role_policy_prefix}/AmazonEKSWorkerNodePolicy",
    "${local.iam_role_policy_prefix}/AmazonEC2ContainerRegistryReadOnly",
    var.iam_role_attach_cni_policy ? local.cni_policy : "",
  ])) : []

This is a type mis-match and will produce the well known error The true and false result expressions must have consistent types.

@bryantbiggs
Copy link
Member

However, if we ignore all of that - the code is written as intended and is correct. If I had to make a guess, I suspect you have a depends_on in your module definition and you should remove this

@demolitionmode
Copy link
Contributor Author

This is a type mis-match and will produce the well known error

This would be the case if the false result was {}, but it is [] which is interchangeable with sets (the below is taken from docs)

Similar kinds of complex types (list/tuple/set and map/object) can usually be used interchangeably within the Terraform language, and most of Terraform's documentation glosses over the differences between the kinds of complex type.

I'm happy to upload test results comparing {} and [] if necessary.


I suspect you have a depends_on in your module definition

That is correct. The reason for this is that there is a resource (vpc-cni) being created outside the module which should happen before any nodes are created, but there is no data being accessed by the nodegroup module; the dependency is solely based on preventing a race condition in which a node is created before a particular kubernetes object exists (ENIConfig).

The reason for delaying node creation stems from the note in AWS docs

Enabling custom networking does not modify existing nodes. ... If you had any nodes in your cluster with running Pods before you switched to the custom CNI networking feature, you should cordon and drain the nodes to gracefully shutdown the Pods and then terminate the nodes.

Given that there are hidden dependencies that Terraform cannot automatically infer, depends_on is useful in this case to explicitly delay creation, as per the intended use

You only need to explicitly specify a dependency when a resource or module relies on another resource's behavior but does not access any of that resource's data in its arguments.

@bryantbiggs
Copy link
Member

you can read through #2337 which covers all of this

@demolitionmode
Copy link
Contributor Author

demolitionmode commented Jan 13, 2023

you can read through #2337 which covers all of this

I'm afraid I don't see a solution in that thread. There are others who also need to wait for the CNI plugin to finish before the node group module can run, but they have been steered towards building their own module for this and implementing a "fake" dependency fix. The alternative I have if this module can't be updated is that I'll have to duplicate the aws_autoscaling_group resource elsewhere and use depends_on for that, which would be a shame to miss out on the features of this module.


Whenever possible, we suggest users avoid depends_on and use explicit references to the data required by each resource, so that Terraform has the most precise data available for planning.

Yes, Terraform say the above, but it is not possible to use an explicit reference to have the node group wait for the CNI plugin to finish before creating; the node group uses no data from this resource.

The example in the issue given when making that statement is not the same as the situation in the node group module. The use case I have been explaining does not have resources being created that are relying on depends_on and the data source, so will not be subject to those circumstances. The IAM resources are being created in a separate invocation of the module, which has no depends_on; the problem is that var.create_iam_instance_profile = false does not properly ignore those resources in the invocation that is only responsible for creating the ASG. Passing in var.create_iam_instance_profile = false should ideally mean that any IAM-related code does not block the rest of the resources, it unfortunately does right now.

To be clear, I have no issue with the data block being in the module, and I don't intend to create any resources that make use of it when using depends_on.

I can create another example that spins up a VPC etc, similar to https://github.com/bryantbiggs/how-to-create-reproduction, if it will help to show why the module needs depends_on in order to delay ASG creation.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants