Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing providers to refine unknown values as "definitely not null" #869

Open
apparentlymart opened this issue Nov 2, 2023 · 3 comments · May be fixed by #1062
Open

Consider allowing providers to refine unknown values as "definitely not null" #869

apparentlymart opened this issue Nov 2, 2023 · 3 comments · May be fixed by #1062
Assignees
Labels
enhancement New feature or request protocol Issues or pull requests that may require changes to the plugin protocol types Issues and pull requests about our types abstraction and implementations.

Comments

@apparentlymart
Copy link

Module version

v1.4.2 is current at the time I'm writing this, although this issue is discussing an entirely new capability.

Use-cases

It's relatively common to write a module with an optional input variable whose presence (non-nullness) causes additional resource instances to get declared.

Here's a simple, though somewhat-contrived example:

variable "iam_policy" {
  type    = string
  default = null
}

resource "aws_iam_role_policy" "example" {
  count = var.iam_policy != null ? 1 : 0

  policy = var.iam_policy
}

The above declares that there should be one instance of aws_iam_role_policy.example if var.iam_policy is not null, or zero instances if it's null.

This is a reasonable module design, but it has a hidden trap: IAM policy documents are often unknown during the planning phase because the AWS provider tends to populate arn attributes only during the apply step, and IAM policy documents typically include ARNs, leading to the policy JSON string being unknown during the plan phase too.

data "aws_iam_policy_document" "example" {
  # ...
}

module "as_above" {
  source = "./as_above"

  # ...
  iam_policy = data.aws_iam_policy_document.example.json
}

Whenever the policy includes an unknown ARN, data.aws_iam_policy_document.example.json itself will be unknown. Providers are allowed to decide that an unknown attribute is finally set to null during the apply step, and so Terraform cannot make any assumptions about what var.iam_policy != null would return inside the module in this case, leading to the dreaded error about the number of instances not being known.

Attempted Solutions

The classic workaround for this is to wrap the possibly-unknown string up in some kind of container whose nullness can be independent of the string inside:

variable "iam_policy" {
  type    = object({
    json = string
  })
  default = null
}

resource "aws_iam_role_policy" "example" {
  count = var.iam_policy != null ? 1 : 0

  policy = var.iam_policy.json
}

In the above example, the caller could set iam_policy = data.aws_iam_policy_document.example, and Terraform knows that a reference to an entire resource instance cannot possibly be null and so this would succeed. However, it requires the module author to have been aware of this problem when writing the module originally; if they learn about the problem later only once someone encounters the problem, it would be a breaking change to introduce the extra container.

Terraform Core v1.6 has introduced the ability for the language runtime to track if an unknown value is known to be non-null even though its final value isn't known, and various built-in language features and functions already know how to generate that additional information and so that's made available a new workaround that is under the control of the caller of the module, rather than the author of the module, by passing the value through one of Terraform's functions that guarantees that its result cannot be null. For example:

module "as_above" {
  source = "./as_above"

  # ...
  iam_policy = trimspace(data.aws_iam_policy_document.example.json)
}

The trimspace function returns an error if its argument is null, and so its argument cannot possibly be null. Leading and trailing space is not significant in a JSON document, so trimming it is effectively a no-op, but this also secretly makes var.iam_policy be "unknown but definitely not null", rather than just "unknown", which means that var.iam_policy != null will now return true instead of an unknown boolean result.

This is a slightly more satisfying workaround because it doesn't involve making a breaking change to the callee, but it also relies on some "magic" that would be unclear to a future reader who isn't familiar with all of the fine details of the Terraform language: there's nothing in the above that explains that trimspace is "load-bearing", and so it would be reasonable for a future maintainer to remove it believing it's doing nothing, and then not notice they've broken things until the policy document next becomes unknown, which could be in the far future.

Proposal

I think it would be ideal if we offered provider developers a way to participate in this ability to declare that an unknown value is definitely not null, so that data.aws_iam_policy_document.example.json alone would be sufficient to get a functional result: this data source will always either produce a valid JSON string or it will return an error, so it's a good candidate for being marked as definitely-not-null.

I don't have a concrete proposal for how to represent this in the framework API, however. I'm creating this issue only to mark that it might be a good time to start thinking about that, since the built-in value refinement capabilities in Terraform Core now seem to be working reasonably well and so I don't expect the "definitely not null" part of it to need significant changes that would be more difficult to make once providers are using it.

Terraform Core also supports some other "refinements" for unknown values, such as known prefixes on unknown strings and upper and lower bounds on possible collection lengths. It might be good to think about how and whether those could be incorporated in future too, but I would suggest holding on actually implementing those until we get a little more experience with the behaviors in Terraform Core, since they arise less often in practice and so I've not seen so many practical examples of them working yet. "Definitely not null" seems both working well enough that I feel willing to commit to not significantly changing it and is also the refinement that has the most impact on practical use of Terraform.

References

As discussed over in hashicorp/terraform#30937, I'd like to in future reach a point where the presence of unknown values is dealt with in a different way than totally failing with an error, which would make the situation I described under "use-cases" less significant than it is today.

However, based on my research so far it seems that the most likely change in that area is for Terraform to propose only a partial plan and defer planning some actions until a future run when more information is available, and so that would be better than a hard failure but still non-ideal. Therefore I think it would still be better to reduce the number of situations where unknown values arise, so that this "deferring" behavior will be reserved only for when it's absolutely unavoidable.

Because Terraform Core uses the same MessagePack encoding for its plan file format and for the provider protocol, there's already wire protocol support for refinements which the framework could begin using, but I think the bigger question here is what would be a good way to represent this in the framework API as experienced by provider developers, so that it's relatively convenient to use and thus more likely that provider developers will make use of it.

@apparentlymart apparentlymart added the enhancement New feature or request label Nov 2, 2023
@bflad bflad added types Issues and pull requests about our types abstraction and implementations. protocol Issues or pull requests that may require changes to the plugin protocol labels Nov 2, 2023
@bflad
Copy link
Contributor

bflad commented Nov 2, 2023

Hi @apparentlymart 👋 Thank you for writing this up! I'm not sure we have a standardized/great way to track this in GitHub, however this is an enhancement we are considering early next year unless we get some spike time. Real briefly --

We'll need to teach terraform-plugin-go's tftypes package how to decode the MessagePack extensions correctly and expose them meaningfully. Its MessagePack code is fairly similar to cty in this regard, so hopefully that is relatively straightforward to implement both encoding and decoding, but it exposes type/value functionality differently than cty so that might need some design work. Upstream issue for tracking: hashicorp/terraform-plugin-go#315

Once that is done, then terraform-plugin-framework's type system can be updated with however we want to expose it to provider developers. 👍

@bflad
Copy link
Contributor

bflad commented Jan 10, 2024

From an out-of-band discussion that I don't want to lose: when this is implemented, we should double check an inconsistent apply situation of a provider planning an unknown-refined not null value then errantly setting it to null on apply.

@apparentlymart
Copy link
Author

apparentlymart commented May 3, 2024

For anyone who ends up here in the meantime hoping to find a workaround, while not wearing my HashiCorp hat (in my spare time) I wrote the provider apparentlymart/assume as a more explicit way for module authors to force-refine values, as long as they are using Terraform v1.8 or later.

With that provider the trimspace-based workaround in my original writeup can change to something that's at least a little clearer that something special is going on:

terraform {
  required_providers {
    assume = {
      source = "apparentlymart/assume"
    }
  }
}

module "as_above" {
  source = "./as_above"

  # ...
  iam_policy = provider::assume::notnull(data.aws_iam_policy_document.example.json)
}

This notnull function applies the same "definitely not null" refinement to its result as the trimspace function does, but it has no other behavior beyond adding that refinement and so it's at least hopefully a little clearer to a future reader that this is doing something important, and they can refer to the notnull function's documentation to find out what exactly it's doing.

Once again I will disclaim that this is a personal project I worked on separately from my job at HashiCorp, so this is just a community provider like any other, isn't supported by any team at HashiCorp, and I personally will work on it only as my personal time and interest allows. However, I do think its functionality is essentially done and so no further maintenance should be needed. If you're comfortable with that situation then you might be able to make use of this as a more explicit way to force annotations that providers aren't yet able to produce.

I hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request protocol Issues or pull requests that may require changes to the plugin protocol types Issues and pull requests about our types abstraction and implementations.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants