Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort should retain length of list #31035

Closed
Nuru opened this issue May 11, 2022 · 6 comments · Fixed by #33234
Closed

Sort should retain length of list #31035

Nuru opened this issue May 11, 2022 · 6 comments · Fixed by #33234
Labels
enhancement new new issue not yet triaged unknown-values Issues related to Terraform's treatment of unknown values

Comments

@Nuru
Copy link

Nuru commented May 11, 2022

Current Terraform Version

Terraform v1.1.9
on darwin_amd64

Use-cases

Given a list of items, e.g. subnet IDs, I want to create a corresponding list of resources, e.g. route tables. Since the subnet IDs will not be known at plan time, this will not work:

resource "aws_route_table_association" "public" {
  for_each = var.subnet_ids

OK, I understand, the key values are unknown, so then I switch to

resource "aws_route_table_association" "public" {
  count = length(var.subnet_ids)

which works, but now every time the order of the list of subnets changes (which may be generated by an API beyond my control), all the route table associations change. I want to keep the associations consistent as long as the set of subnet IDs remains the same (which is what I was attempting with for_each), so I try sorting the list:

resource "aws_route_table_association" "public" {
  count = length(sort(var.subnet_ids))

Unfortunately, this breaks, too.

Proposal

It appears that at present, sort returns a list of unknown length. It should return a list of known length of unknown values.

References

#30937 (comment)

@jbardin
Copy link
Member

jbardin commented May 11, 2022

Hi @Nuru,

Thanks for filing the request!

It seems possible to update sort to return a more precise value, but one thing to consider here is that we would then have a slightly different case when sort is given a series of completely unknown values vs a series of partially known values. The latter must result in an unknown list, which of course is the reason is does so already. I'm not sure if that discrepancy matters in practice, but we do need to be aware of that with any new usage.

Given the example shown however, I would not use sort there to begin with. If the goal is to use count with the number of values to expand the instance, then sort is unnecessary and using length(var.subnet_ids) should suffice. Did you have a more complex use case in mind where the sorted value could not be wholly unknown during the plan?

@apparentlymart
Copy link
Contributor

Considering the use case of sorting subnet IDs to avoid them churning over time, is it true that AWS guarantees that any new subnet ID issued will always be lexically greater than any previous one issued, or is it possible that a newly-added subnet ID might sort earlier than a preexisting one and thus end up creating the same problem here?

Typically we'd advise dealing with this situation by assigning each subnet a locally-defined name specific to the configuration in addition to the ID assigned by the remote system, and then have Terraform track the instances by the local name instead of the remote name. The typical way to represent that would be for the input variable to be a mapping from local names to remote IDs:

variable "subnet_ids" {
  type = map(string)
}

resource "aws_route_table_association" "public" {
  for_each = var.subnet_ids

  subnet_id = each.value
  # ...
}

In this pattern the idea is for the caller of the module to name the subnets after something that makes sense in this context that is chosen statically in the configuration. A typical answer I've seen is to name them after availability zones, in the common case where there's a single subnet per availability zone. Another variant is to name them with compound strings like "public:us-west-2a" to represent the pattern of having both a public and a private subnet for each availability zone. Whatever the details, the name captures something that serves as a strong identifier for each subnet that Terraform can reliably recalculate on future runs to get the same result.

I suppose what you are doing here is effectively the same except you are setting the "local names" to instead be numeric indices representing the position of the subnets in a lexically-sorted list. That works as long as new items are always lexically greater than all that existed before, but that isn't true for all systems. (I don't know if it's true for AWS EC2.)

I feel a bit torn here because on the surface it does seem reasonable to make sort return a list of the same length as the input but for all of its values to be unknown, which would therefore make length work but still ensure that accessing any particular element returns an unknown value as is the case today, but I worry that this encourages a risky behavior of relying on lexical ordering instead of taking the more robust option of explicitly choosing a meaningful local name for each subnet.

Could the situation you have in mind potentially be handled using a map like I've shown above?

@apparentlymart
Copy link
Contributor

apparentlymart commented May 11, 2022

Since I proposed opening this issue in a comment elsewhere, I want to be clear with others who might read this what exactly I was imagining we'd consider here.

Imagine a value named local.example whose current value during planning is the following:

tolist([
  "subnet-abc123",
  (known after apply),
  "subnet-def456",
  (known after apply),
  (known after apply),
])

Today sort observes that it cannot possibly know what index would be correct to use for either "subnet-abc123" and "subnet-def456" until the other values are known, and so it conservatively returns an entirely-unknown list. Unknown lists have an unknown length, so length(sort(local.example)) returns an unknown number today.

My proposition is that although we don't know what value each index in the resulting list will have, we do know how many indices there will be, which is a different way to say we know how long the result will be: it'll be the same length as the input list.

So given that, would it be reasonable for sort(local.example) to instead return the following value?

tolist([ # (of string)
  (known after apply),
  (known after apply),
  (known after apply),
  (known after apply),
  (known after apply),
])

Now length(sort(local.example)) would return 5, but local.example[0] would return an unknown string as would've been the case before. This means we can preserve the known-ness of the list length even though we can't preserve the known-ness of the elements.

At first blush this seems backward compatible to me because we've typically considered it okay for something previously unknown to start being known in a newer version, as long as it starts being known in a manner consistent with our usual rules about what known values are allowed to substitute for an unknown value placeholder.

However, I've not considered it in detail yet, so I may be missing some edge cases that make this concerning. One thing that would be worth exploring is how this would behave if local.example were a set of strings instead, because the length of a set with unknown values is itself unknown: we don't know yet if any unknown values will coalesce with other values. I would hope that today that conversion would result in a wholly-unknown list and thus the sort would immediately return a wholly-unknown list itself, but I haven't verified if that is true.

@Nuru
Copy link
Author

Nuru commented May 14, 2022

@apparentlymart wrote:

Considering the use case of sorting subnet IDs to avoid them churning over time, is it true that AWS guarantees that any new subnet ID issued will always be lexically greater than any previous one issued, or is it possible that a newly-added subnet ID might sort earlier than a preexisting one and thus end up creating the same problem here?

No, of course new subnet IDs likely will not fall lexicographically after existing ones, but that is not my concern. I want to address situations where the set of items remains stable but the order of items in the list does not. For example, AFAIK, given

data "aws_subnets" "private" {
  tags = {
    "subnetType" = "private"
  }
}

the value (order of subnet IDs in the list) of data.aws_subnets.private.ids is not stable. (I am not sure if this particular example is, in practice, stable, but for sure the deprecated data.aws_subnet_ids data source returns values in a different order than the replacement data.aws_subnets, so if a module updates from one data source to the other, the order of the list will change while the set of values will not. There are many other examples.) I also don't know how many there will be, so I cannot label them or fit them into a map that is any better than

{ for i, v in data.aws_subnets.private.ids : i => v }

which only solves the problem if I establish some order on the IDs. The easy way would be to sort them. Any other solution requires adding additional data to the solution, and such data is not always easily available.

@jbardin said (paraphrasing)

When sort is given a series of completely unknown values, it must result in an unknown list.

Why is that? Why cannot sort(data.aws_subnets.private.ids) be a list of length(data.aws_subnets.private.ids) of unknown values?

@jbardin
Copy link
Member

jbardin commented May 16, 2022

@Nuru Sorry, I agree with the assessment here that a list of unknown values should work. My initial reaction was due to me thinking about some internal function transformations which turned out to be unnecessary.

@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement new new issue not yet triaged unknown-values Issues related to Terraform's treatment of unknown values
Projects
None yet
3 participants